Objective¶
The main goal of this project is to analyze traffic, pedestrian data, and weather conditions to identify patterns and factors that contribute to pedestrian safety. This analysis will help in enhancing safety measures and improving walking conditions.
Acceptance Criteria¶
Data Collection and Integration¶
- Sources: The system must integrate data from various sources including:
- Weather conditions (temperature, UV index, rainfall).
- Pedestrian counts.
- Specific geographic locations.
- Detailed topographical data to assess the steepness of pedestrian paths.
- Timeliness: Data should be updated over time to reflect the most current information available, ideally covering the past several months.
Data Analysis and Reporting¶
- Regression Analysis: Implement a regression model to understand how various factors, such as weather conditions and specific locations, impact pedestrian safety.
- Correlation Analysis: Use correlation matrices to identify variables that are highly correlated to address potential issues of multicollinearity.
- Pathway Calculation: Develop algorithms to calculate the safest and most efficient pathways, minimizing steepness and exposure to potential hazards.
Route Optimization and Mapping¶
- GIS Technology: Utilize Geographic Information Systems technology to map out optimized safety routes based on model findings.
- Alternative Routes: Provide alternative routes that balance steepness with environmental and urban factors, catering to personal preferences.
Model Optimization¶
- Dimensionality Reduction: Apply PCA to manage data efficiency and complexity.
- Regularization Methods: Incorporate Ridge or Lasso regularization to handle multicollinearity and improve model performance.
- Feature Selection: Develop a feature selection strategy to eliminate redundant or irrelevant features to enhance model accuracy.
Visualization and Decision Support¶
- Visualization Dashboard: Develop a dashboard to display traffic and pedestrian safety metrics across different times and locations, including heatmaps to highlight key correlations and trends.
- Interactive Map: Create an interactive map or application that provides route recommendations based on model insights.
Feedback and Iteration¶
- Feedback Loops: Implement feedback loops to monitor the outcomes of safety measures, re-analyze data, and refine models based on effectiveness.
Technical Notes¶
- Data Privacy and Security: Ensure the privacy and security of data, especially with real-time data integration.
- Scalability: Consider the scalability of the data processing infrastructure to handle increasing data volume.
- Data Accuracy: Ensure that the steepness data is accurate and regularly updated to reflect current pathway conditions.
- Accessibility: Consider the needs of all users, including those with disabilities, to ensure that routes are universally accessible.
.lysis and feedback. it.
At the end of this use case, I will have demonstrated a broad range of skills essential for data-driven urban planning and public safety enhancement. These include Data Integration, where I'll show the ability to merge and utilise data from diverse sources such as weather conditions, pedestrian counts, and geographic specifics in real-time or near-real-time. In Statistical Analysis and Modeling, I'll apply statistical techniques and regression models to dissect the impact of various environmental and urban factors on pedestrian safety, tackling issues like multicollinearity and data dimensionality using methods like PCA and regularisation.
M work in Geospatial Analysis willhighlight hm proficiency with GIS technology, enabling you to assess andoptimisee pedestrian routes based on topographical data like route steepness. In the realm of Machine Learning and Predictive ModelingIou'll refine predictive models to anticipate pedestrian traffic patterns and identify risk factors, enhancing model accuracy through careful feature selection.
Software Development skills will come into pyin developing interactive applications that advise users on safe pedestrian routes, integrating complex backend analytics with user-friendly interfaces. My focus on User-Centric Design and Feedback processes ensuresrat these tools are accessible and practical, incorporating user feedback for continuous improvemen
Promanagement and collaboration skills will be crucial to coordinating with stakeholders, including government bodies and public safety organisations, and ons, effectively communicating technical findings to inform and shape polFinallyFinMy, your understanding of Ethical and Privacy Considerations ensures that all data handling is conducted with the utmost respect for privacy and compliance with legal standards, establishing solutions that are not only effective but also ethically sound and securndards.
Introduction / background relating to problem¶
In modern urban environments, pedestrian safety is a crucial concern for city planners and public officials. As cities grow and traffic increases, the challenge of ensuring safe and accessible pedestrian pathways becomes increasingly complex. Addressing this issue requires a comprehensive understanding of the various factors that influence pedestrian safety, including geographic features, traffic patterns, and environmental conditions such as weather.
The use of data-driven approaches to urban planning offers a powerful tool to enhance pedestrian safety. By integrating and analyzing data from diverse sources—such as weather stations for real-time weather conditions, traffic sensors for vehicle and pedestrian counts, and GIS data for detailed geographic and topographical information—planners can identify high-risk areas, predict potential safety issues, and implement effective interventions.
Packages¶
In this code chunk, I establish the environment and import all necessary libraries for data analysis, geospatial processing, visualization, and machine learning. By configuring environment variables and request caching, I ensure secure and efficient API interactions. The libraries I bring in allow me to handle and visualize data, perform spatial analysis and clustering, model and evaluate pedestrian safety, and calculate optimal routes using external APIs. This setup forms the backbone of the project, enabling comprehensive analysis and visualization to enhance pedestrian safety in Melbourne.
import os
import scipy
import json
import datetime
import folium
import numpy as np
import pandas as pd
import geopandas as gpd
import logging
import requests
import requests_cache
import matplotlib.pyplot as plt
from io import StringIO
from dotenv import load_dotenv
from IPython.display import display, clear_output, HTML
from ipywidgets import interact, widgets
from scipy.spatial.distance import cdist, pdist, squareform
from scipy.cluster.hierarchy import linkage, fcluster
from sklearn.neighbors import KDTree
from shapely.geometry import Point, LineString, Polygon, shape
from scipy.spatial.distance import euclidean
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
from scipy.stats import linregress
import seaborn as sns
import networkx as nx
import openrouteservice
from openrouteservice import convert
from folium.plugins import HeatMap
from retry_requests import retry
import openmeteo_requests
# Load environment variables
load_dotenv()
# Set up requests cache
requests_cache.install_cache()
import warnings
warnings.filterwarnings('ignore')
Footpath Steepness dataset¶
In this code chunk, I load the API key from environment variables to securely access the Melbourne Testbed API, which provides data on footpath steepness. By constructing a request URL and specifying parameters, I retrieve the dataset in CSV format using an HTTP GET request. The dataset, containing comprehensive information on footpath steepness, is then loaded into a pandas DataFrame for further analysis. The successful retrieval of data is validated by sampling a few records, ensuring that the data is correctly loaded and ready for processing. This step is crucial as it provides the foundational data necessary for the subsequent spatial and statistical analysis aimed at improving pedestrian safety in Melbourne.
load_dotenv()
api_key = os.environ.get("API_KEY_MOP")
base_url = 'https://melbournetestbed.opendatasoft.com/api/explore/v2.1/catalog/datasets/'
dataset_id = 'footpath-steepness'
apikey = api_key
dataset_id = dataset_id
format = 'csv'
params = {
'select': '*',
'limit': -1, # all records
'lang': 'en',
'timezone': 'UTC',
'api_key': apikey
}
url = f'{base_url}{dataset_id}/exports/{format}'
#GET request
response = requests.get(url, params=params)
if response.status_code == 200:
# StringIO to read the CSV data
url_content = response.content.decode('utf-8')
footpath_steepness = pd.read_csv(StringIO(url_content), delimiter=';')
print(footpath_steepness.sample(10, random_state=999)) # Test
else:
print(f'Request failed with status code {response.status_code}')
geo_point_2d \
6939 -37.793531246364374, 144.94043501902428
5502 -37.826686836876966, 144.97103418420872
3964 -37.82336149873195, 144.96754168744584
2188 -37.79942098354816, 144.9709189315643
18168 -37.82859691392617, 144.9710174388107
22798 -37.81911654327464, 144.95069698919994
25768 -37.808123333372556, 144.95140983596548
29434 -37.79447227001396, 144.9311697639647
1798 -37.8009149408176, 144.96210417505176
25796 -37.819037758767095, 144.96080691762364
geo_shape grade1in gradepc \
6939 {"coordinates": [[[[144.94042678123358, -37.79... 160.1 0.62
5502 {"coordinates": [[[[144.9710361192438, -37.826... 96.2 1.04
3964 {"coordinates": [[[[144.96751421509197, -37.82... 45.0 2.22
2188 {"coordinates": [[[[144.9712146619255, -37.799... 53.7 1.86
18168 {"coordinates": [[[[144.9709873318095, -37.828... 19.4 5.15
22798 {"coordinates": [[[[144.9506018695263, -37.819... 8.5 11.81
25768 {"coordinates": [[[[144.95139387066837, -37.80... 89.1 1.12
29434 {"coordinates": [[[[144.93116782698365, -37.79... 14.9 6.71
1798 {"coordinates": [[[[144.9619391219952, -37.800... 32.5 3.07
25796 {"coordinates": [[[[144.9607684427436, -37.818... 16.4 6.10
segside statusid asset_type deltaz streetid mccid_int mcc_id \
6939 NaN NaN Road Footway 0.80 NaN NaN 1388715
5502 NaN NaN Road Footway 0.40 NaN NaN 1384099
3964 NaN 1.0 Road Footway 0.26 0.0 22084.0 1383936
2188 NaN 3.0 Road Footway 0.83 485.0 20674.0 1384465
18168 NaN 1.0 Road Footway 0.40 1056.0 22093.0 1384054
22798 NaN 3.0 Road Footway 1.97 117915.0 22897.0 1477315
25768 West 1.0 Road Footway 1.00 761.0 21427.0 1385490
29434 North 2.0 Road Footway 0.20 847.0 23205.0 1388129
1798 NaN NaN Road Footway 0.39 NaN NaN 1384655
25796 West 2.0 Road Footway 1.84 1424.0 20179.0 1387447
address rlmax rlmin \
6939 NaN 4.91 4.11
5502 NaN 10.20 9.80
3964 Intersection of Sturt Street and Southbank Bou... 2.83 2.57
2188 Carlow Place between Rathdowne Street and Fara... 37.11 36.28
18168 Intersection of St Kilda Road and Coventry Street 11.20 10.80
22798 Mayfield Place between Aurora Lane and Wurundj... 4.53 2.56
25768 King Street between Rosslyn Street and Stanley... 25.58 24.58
29434 Macaulay Road between Barnett Street and Eastw... 11.72 11.52
1798 NaN 32.96 32.57
25796 Market Street between Flinders Street and Flin... 4.59 2.75
distance
6939 128.10
5502 38.48
3964 11.71
2188 44.55
18168 7.76
22798 16.69
25768 89.11
29434 2.98
1798 12.70
25796 30.14
footpath_steepness.head
<bound method NDFrame.head of geo_point_2d \
0 -37.823036142583945, 144.94866061456034
1 -37.79542957518662, 144.91714933764632
2 -37.79544286753349, 144.9172426574227
3 -37.79580169415494, 144.92075182140118
4 -37.79654832375531, 144.92328274904054
... ...
33580 -37.82528644947733, 144.90971619143193
33581 -37.8252692552434, 144.90973904472057
33582 -37.794217597415205, 144.91881543737387
33583 -37.793352986995224, 144.9309301120561
33584 -37.78827197433308, 144.93918224198853
geo_shape grade1in gradepc \
0 {"coordinates": [[[[144.94865791889143, -37.82... 4.2 23.81
1 {"coordinates": [[[[144.9171360775573, -37.795... NaN NaN
2 {"coordinates": [[[[144.917238930522, -37.7954... NaN NaN
3 {"coordinates": [[[144.92074176246658, -37.795... 35.1 2.85
4 {"coordinates": [[[[144.92328246984576, -37.79... 109.6 0.91
... ... ... ...
33580 {"coordinates": [[[[144.90970378816345, -37.82... 517.3 0.19
33581 {"coordinates": [[[[144.90972816098898, -37.82... 517.3 0.19
33582 {"coordinates": [[[[144.91881416724726, -37.79... 29.0 3.45
33583 {"coordinates": [[[[144.93092637131684, -37.79... 40.3 2.48
33584 {"coordinates": [[[144.93832442213275, -37.788... 25.4 3.94
segside statusid asset_type deltaz streetid mccid_int mcc_id \
0 NaN 8.0 Road Footway 6.77 3094.0 30821.0 1388075
1 NaN NaN Road Footway NaN NaN NaN 1534622
2 NaN NaN Road Footway NaN NaN NaN 1534622
3 NaN NaN Road Footway 0.23 NaN NaN 1387592
4 NaN NaN Road Footway 0.01 NaN NaN 1387085
... ... ... ... ... ... ... ...
33580 NaN NaN Road Footway 0.43 NaN NaN 1386764
33581 NaN NaN Road Footway 0.43 NaN NaN 1386764
33582 NaN NaN Road Footway 0.38 NaN NaN 1390243
33583 NaN NaN Road Footway 1.02 NaN NaN 1390225
33584 NaN 9.0 Road Footway 7.40 3129.0 30787.0 1386451
address rlmax rlmin distance
0 Yarra River 6.86 0.09 28.43
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 NaN 2.78 2.55 8.07
4 NaN 3.39 3.38 1.11
... ... ... ... ...
33580 NaN 2.72 2.29 222.47
33581 NaN 2.72 2.29 222.47
33582 NaN 2.75 2.37 11.03
33583 NaN 9.33 8.31 41.16
33584 Upfield Railway 14.90 7.50 187.94
[33585 rows x 15 columns]>
Microclimate data¶
In this code chunk, I retrieve microclimate sensor data from the Melbourne Testbed API by securely accessing it with an API key loaded from environment variables.
load_dotenv()
api_key = os.environ.get("API_KEY_MOP")
base_url = 'https://melbournetestbed.opendatasoft.com/api/explore/v2.1/catalog/datasets/'
dataset_id = 'microclimate-sensors-data'
apikey = api_key
dataset_id = dataset_id
format = 'csv'
params = {
'select': '*',
'limit': -1, # all records
'lang': 'en',
'timezone': 'UTC',
'api_key': apikey
}
url = f'{base_url}{dataset_id}/exports/{format}'
#GET request
response = requests.get(url, params=params)
if response.status_code == 200:
# StringIO to read the CSV data
url_content = response.content.decode('utf-8')
microclimate_data = pd.read_csv(StringIO(url_content), delimiter=';')
print(microclimate_data.sample(10, random_state=999)) # Test
else:
print(f'Request failed with status code {response.status_code}')
device_id received_at \
13363 ICTMicroclimate-08 2024-06-24T05:10:46+00:00
404 ICTMicroclimate-08 2024-07-11T09:35:36+00:00
63856 ICTMicroclimate-09 2024-08-23T06:30:37+00:00
9548 ICTMicroclimate-09 2024-07-01T14:42:21+00:00
59387 ICTMicroclimate-03 2024-08-09T03:33:08+00:00
43099 ICTMicroclimate-08 2024-07-28T20:31:21+00:00
18160 ICTMicroclimate-02 2024-06-20T09:15:43+00:00
62449 ICTMicroclimate-07 2024-08-17T00:13:31+00:00
70092 ICTMicroclimate-08 2024-08-28T20:40:20+00:00
24801 ICTMicroclimate-06 2024-07-04T22:39:16+00:00
sensorlocation \
13363 Swanston St - Tram Stop 13 adjacent Federation...
404 Swanston St - Tram Stop 13 adjacent Federation...
63856 SkyFarm (Jeff's Shed). Rooftop - Melbourne Con...
9548 SkyFarm (Jeff's Shed). Rooftop - Melbourne Con...
59387 CH1 rooftop
43099 Swanston St - Tram Stop 13 adjacent Federation...
18160 101 Collins St L11 Rooftop
62449 Tram Stop 7C - Melbourne Tennis Centre Precinc...
70092 Swanston St - Tram Stop 13 adjacent Federation...
24801 Tram Stop 7B - Melbourne Tennis Centre Precinc...
latlong minimumwinddirection averagewinddirection \
13363 -37.8184515, 144.9678474 0.0 311.0
404 -37.8184515, 144.9678474 0.0 288.0
63856 -37.8223306, 144.9521696 0.0 333.0
9548 -37.8223306, 144.9521696 0.0 5.0
59387 -37.8140348, 144.96728 0.0 94.0
43099 -37.8184515, 144.9678474 0.0 345.0
18160 -37.814604, 144.9702991 0.0 31.0
62449 -37.8222341, 144.9829409 0.0 326.0
70092 -37.8184515, 144.9678474 0.0 333.0
24801 -37.8194993, 144.9787211 0.0 9.0
maximumwinddirection minimumwindspeed averagewindspeed \
13363 359.0 0.0 1.1
404 359.0 0.0 2.4
63856 359.0 0.0 0.6
9548 359.0 0.0 0.4
59387 332.0 0.0 1.0
43099 359.0 0.0 0.7
18160 358.0 0.0 0.3
62449 353.0 0.0 0.5
70092 359.0 0.0 0.7
24801 359.0 0.0 0.2
gustwindspeed airtemperature relativehumidity atmosphericpressure \
13363 5.4 12.9 55.3 1013.9
404 6.0 13.3 64.7 1012.3
63856 4.9 17.7 57.4 1012.7
9548 1.4 9.2 73.2 1029.5
59387 2.7 17.8 46.1 1018.3
43099 2.5 8.2 86.7 1029.3
18160 1.1 11.8 67.0 1011.7
62449 2.6 11.5 83.5 1007.7
70092 2.8 12.1 60.8 1012.1
24801 3.6 8.7 86.9 1039.9
pm25 pm10 noise
13363 12.0 13.0 77.7
404 6.0 7.0 72.6
63856 1.0 1.0 61.8
9548 7.0 10.0 56.9
59387 4.0 6.0 71.3
43099 3.0 4.0 63.7
18160 29.0 33.0 70.3
62449 3.0 3.0 65.9
70092 5.0 9.0 67.4
24801 51.0 61.0 61.2
microclimate_data.head
<bound method NDFrame.head of device_id received_at \
0 ICTMicroclimate-09 2024-07-17T15:33:32+00:00
1 ICTMicroclimate-03 2024-07-17T15:06:13+00:00
2 ICTMicroclimate-07 2024-07-17T15:21:33+00:00
3 ICTMicroclimate-08 2024-07-17T15:40:34+00:00
4 ICTMicroclimate-02 2024-07-17T15:42:47+00:00
... ... ...
87669 ICTMicroclimate-09 2024-09-18T01:53:01+00:00
87670 ICTMicroclimate-06 2024-09-18T01:54:10+00:00
87671 ICTMicroclimate-01 2024-09-18T01:58:23+00:00
87672 ICTMicroclimate-03 2024-09-18T01:55:40+00:00
87673 ICTMicroclimate-10 2024-09-18T02:03:46+00:00
sensorlocation \
0 SkyFarm (Jeff's Shed). Rooftop - Melbourne Con...
1 CH1 rooftop
2 Tram Stop 7C - Melbourne Tennis Centre Precinc...
3 Swanston St - Tram Stop 13 adjacent Federation...
4 101 Collins St L11 Rooftop
... ...
87669 SkyFarm (Jeff's Shed). Rooftop - Melbourne Con...
87670 Tram Stop 7B - Melbourne Tennis Centre Precinc...
87671 Birrarung Marr Park - Pole 1131
87672 CH1 rooftop
87673 NaN
latlong minimumwinddirection averagewinddirection \
0 -37.8223306, 144.9521696 0.0 300.0
1 -37.8140348, 144.96728 0.0 308.0
2 -37.8222341, 144.9829409 0.0 262.0
3 -37.8184515, 144.9678474 0.0 339.0
4 -37.814604, 144.9702991 7.0 118.0
... ... ... ...
87669 -37.8223306, 144.9521696 0.0 253.0
87670 -37.8194993, 144.9787211 0.0 54.0
87671 -37.8185931, 144.9716404 NaN 76.0
87672 -37.8140348, 144.96728 90.0 30.0
87673 NaN 0.0 29.0
maximumwinddirection minimumwindspeed averagewindspeed \
0 359.0 0.0 0.9
1 349.0 0.0 0.4
2 354.0 0.0 0.4
3 359.0 0.0 0.9
4 261.0 1.4 2.1
... ... ... ...
87669 359.0 0.0 3.0
87670 359.0 0.0 1.9
87671 NaN NaN 0.4
87672 90.0 0.7 1.4
87673 357.0 0.7 2.3
gustwindspeed airtemperature relativehumidity atmosphericpressure \
0 3.5 8.7 86.3 1013.100000
1 1.0 8.5 99.0 1008.700000
2 1.6 9.0 85.0 1016.100000
3 4.3 9.0 83.9 1014.100000
4 4.1 9.0 96.7 1009.400000
... ... ... ... ...
87669 7.8 19.1 24.2 1008.700000
87670 8.1 19.7 23.3 1010.100000
87671 NaN 19.6 22.0 1009.299988
87672 1.1 19.8 20.8 1004.500000
87673 3.8 19.7 21.7 1005.600000
pm25 pm10 noise
0 1.0 4.0 63.100000
1 3.0 5.0 69.700000
2 0.0 0.0 55.300000
3 1.0 1.0 60.600000
4 8.0 11.0 69.000000
... ... ... ...
87669 0.0 0.0 65.700000
87670 1.0 1.0 82.200000
87671 0.0 3.0 55.599998
87672 2.0 4.0 71.000000
87673 2.0 4.0 90.400000
[87674 rows x 16 columns]>
Pedestrian monthly Counts per hour dataset¶
In this snippet, I retrieve hourly pedestrian count data from an API, which I then load into a pandas DataFrame for processing. This data is crucial for analysing pedestrian traffic patterns. I ensure the completeness of the time series by filling in any missing timestamps and replacing missing data with zeros. This preparation is essential for accurate analysis and modelling in my project aimed at enhancing pedestrian safety.
# Load environment variables
load_dotenv()
api_key = os.environ.get("API_KEY_MOP")
# Define the base URL and dataset parameters
base_url = 'https://melbournetestbed.opendatasoft.com/api/explore/v2.1/catalog/datasets/'
dataset_id = 'pedestrian-counting-system-monthly-counts-per-hour'
format = 'csv'
params = {
'select': '*',
'limit': -1, # all records
'lang': 'en',
'timezone': 'UTC',
'api_key': api_key
}
url = f'{base_url}{dataset_id}/exports/{format}'
# GET request to fetch data
response = requests.get(url, params=params)
if response.status_code == 200:
# Read the CSV data from the response
url_content = response.content.decode('utf-8')
pedestrian_count = pd.read_csv(StringIO(url_content), delimiter=';')
# Combine 'sensing_date' and 'hourday' to create a 'timestamp' column
pedestrian_count['sensing_date'] = pd.to_datetime(pedestrian_count['sensing_date'])
pedestrian_count['timestamp'] = pedestrian_count['sensing_date'] + pd.to_timedelta(pedestrian_count['hourday'], unit='h')
# Generate a continuous range of hours between the min and max timestamps
all_hours = pd.date_range(start=pedestrian_count['timestamp'].min(), end=pedestrian_count['timestamp'].max(), freq='1H')
all_hours_df = pd.DataFrame({'timestamp': all_hours})
# Merge with the original DataFrame to fill in missing rows
pedestrian_count = pd.merge(all_hours_df, pedestrian_count, on='timestamp', how='left')
# Fill NaN values with 0
pedestrian_count.fillna(0, inplace=True)
# Display the DataFrame
print(pedestrian_count)
# Print a sample of the data for testing
print(pedestrian_count.sample(10, random_state=999))
else:
print(f'Request failed with status code {response.status_code}')
timestamp id location_id sensing_date \
0 2021-07-01 00:00:00 2.802021e+10 28.0 2021-07-01 00:00:00
1 2021-07-01 00:00:00 2.902021e+10 29.0 2021-07-01 00:00:00
2 2021-07-01 00:00:00 9.020211e+09 9.0 2021-07-01 00:00:00
3 2021-07-01 00:00:00 7.602021e+10 76.0 2021-07-01 00:00:00
4 2021-07-01 00:00:00 4.802021e+10 48.0 2021-07-01 00:00:00
... ... ... ... ...
1850400 2024-09-17 03:00:00 1.432024e+10 14.0 2024-09-17 00:00:00
1850401 2024-09-17 03:00:00 1.423202e+11 142.0 2024-09-17 00:00:00
1850402 2024-09-17 03:00:00 6.132024e+10 61.0 2024-09-17 00:00:00
1850403 2024-09-17 03:00:00 1.032024e+10 10.0 2024-09-17 00:00:00
1850404 2024-09-17 03:00:00 6.332024e+10 63.0 2024-09-17 00:00:00
hourday direction_1 direction_2 pedestriancount sensor_name \
0 0.0 24.0 107.0 131.0 VAC_T
1 0.0 8.0 10.0 18.0 AG_T
2 0.0 4.0 6.0 10.0 Col700_T
3 0.0 1.0 0.0 1.0 KenMac_T
4 0.0 3.0 10.0 13.0 QVMQ_T
... ... ... ... ... ...
1850400 3.0 4.0 0.0 4.0 SanBri_T
1850401 3.0 1.0 0.0 1.0 Hammer1584_T
1850402 3.0 9.0 6.0 15.0 RMIT14_T
1850403 3.0 0.0 1.0 1.0 BouHbr_T
1850404 3.0 1.0 4.0 5.0 Bou231_T
location
0 -37.82129925, 144.96879309
1 -37.8199817, 144.96872865
2 -37.81982992, 144.95102555
3 -37.79453803, 144.93036194
4 -37.80631581, 144.95866697
... ...
1850400 -37.82011242, 144.96291897
1850401 -37.81970749, 144.96795734
1850402 -37.80767455, 144.96309114
1850403 -37.81876474, 144.94710545
1850404 -37.81333081, 144.96675571
[1850405 rows x 10 columns]
timestamp id location_id sensing_date \
1507199 2024-03-14 12:00:00 1.371220e+12 137.0 2024-03-14 00:00:00
55880 2021-08-11 01:00:00 4.312021e+10 43.0 2021-08-11 00:00:00
1272317 2023-11-07 09:00:00 7.592023e+10 75.0 2023-11-07 00:00:00
1175349 2023-09-11 20:00:00 1.072020e+12 107.0 2023-09-11 00:00:00
657384 2022-10-09 10:00:00 4.810202e+11 48.0 2022-10-09 00:00:00
215207 2021-12-04 16:00:00 3.716202e+11 37.0 2021-12-04 00:00:00
164481 2021-10-29 14:00:00 6.614202e+11 66.0 2021-10-29 00:00:00
1609230 2024-05-09 14:00:00 4.614202e+11 46.0 2024-05-09 00:00:00
1091939 2023-07-21 21:00:00 6.921202e+11 69.0 2023-07-21 00:00:00
1838814 2024-09-11 00:00:00 6.020241e+09 6.0 2024-09-11 00:00:00
hourday direction_1 direction_2 pedestriancount sensor_name \
1507199 12.0 35.0 113.0 148.0 BouHbr2353_T
55880 1.0 1.0 0.0 1.0 UM2_T
1272317 9.0 24.0 15.0 39.0 SprFli_T
1175349 20.0 69.0 58.0 127.0 280Will_T
657384 10.0 204.0 214.0 418.0 QVMQ_T
215207 16.0 97.0 115.0 212.0 Lyg260_T
164481 14.0 282.0 286.0 568.0 QVN_T
1609230 14.0 106.0 164.0 270.0 Pel147_T
1091939 21.0 51.0 27.0 78.0 FLDegC_T
1838814 0.0 55.0 72.0 127.0 FliS_T
location
1507199 -37.81894815, 144.94612292
55880 -37.79844526, 144.96411782
1272317 -37.81515276, 144.97467661
1175349 -37.81246271, 144.95690188
657384 -37.80631581, 144.95866697
215207 -37.80107122, 144.96704554
164481 -37.81057846, 144.96444294
1609230 -37.80240719, 144.9615673
1091939 -37.81687226, 144.96559144
1838814 -37.81911705, 144.96558255
zero_count = (pedestrian_count == 0).sum()
zero_count
timestamp 0 id 12 location_id 12 sensing_date 12 hourday 72597 direction_1 35171 direction_2 35646 pedestriancount 248 sensor_name 12 location 12 dtype: int64
Pedestrian Counting System - Past Hour (counts per minute) Dataset¶
In this code chunk, I fetch pedestrian counting data from the Melbourne Testbed API, which tracks the number of pedestrians counted per minute over the past hour. By securely loading the API key from environment variables, I ensure that sensitive information remains protected. The code constructs a request URL with appropriate parameters to download the dataset in CSV format.
load_dotenv()
api_key = os.environ.get("API_KEY_MOP")
base_url = 'https://melbournetestbed.opendatasoft.com/api/explore/v2.1/catalog/datasets/'
dataset_id = 'pedestrian-counting-system-past-hour-counts-per-minute'
apikey = api_key
dataset_id = dataset_id
format = 'csv'
params = {
'select': '*',
'limit': -1, # all records
'lang': 'en',
'timezone': 'UTC',
'api_key': apikey
}
url = f'{base_url}{dataset_id}/exports/{format}'
#GET request
response = requests.get(url, params=params)
if response.status_code == 200:
# StringIO to read the CSV data
url_content = response.content.decode('utf-8')
pedestrian_count_min = pd.read_csv(StringIO(url_content), delimiter=';')
print(pedestrian_count_min.sample(10, random_state=999)) # Test
else:
print(f'Request failed with status code {response.status_code}')
location_id sensing_datetime sensing_date sensing_time \
48743 47 2024-09-15T05:34:00+00:00 2024-09-15 15:34
147979 108 2024-09-16T21:29:00+00:00 2024-09-17 07:29
137809 27 2024-09-16T21:50:00+00:00 2024-09-17 07:50
55342 10 2024-09-16T03:02:00+00:00 2024-09-16 13:02
194870 109 2024-09-17T23:40:00+00:00 2024-09-18 09:40
199960 85 2024-09-17T20:37:00+00:00 2024-09-18 06:37
24257 28 2024-09-15T22:38:00+00:00 2024-09-16 08:38
189047 14 2024-09-17T15:54:00+00:00 2024-09-18 01:54
162812 9 2024-09-16T20:51:00+00:00 2024-09-17 06:51
147729 76 2024-09-16T21:55:00+00:00 2024-09-17 07:55
direction_1 direction_2 total_of_directions
48743 26 30 56
147979 1 26 27
137809 1 0 1
55342 2 2 4
194870 7 12 19
199960 0 4 4
24257 14 6 20
189047 0 1 1
162812 1 20 21
147729 0 1 1
This code filters the pedestrian counting data to retain only records at the top of each hour by selecting rows where the minute value is zero. This conversion to hourly data (hourly_df) simplifies the analysis of pedestrian trends over time, making it more suitable for understanding broader movement patterns in urban planning.
# Filter rows where the minute and second are zer
pedestrian_count_min['sensing_datetime'] = pd.to_datetime(pedestrian_count_min['sensing_datetime'])
hourly_df = pedestrian_count_min[pedestrian_count_min['sensing_datetime'].dt.minute == 0]
hourly_df
| location_id | sensing_datetime | sensing_date | sensing_time | direction_1 | direction_2 | total_of_directions | |
|---|---|---|---|---|---|---|---|
| 1 | 107 | 2024-09-15 14:00:00+00:00 | 2024-09-16 | 00:00 | 2 | 2 | 4 |
| 6 | 131 | 2024-09-15 14:00:00+00:00 | 2024-09-16 | 00:00 | 2 | 0 | 2 |
| 8 | 134 | 2024-09-15 14:00:00+00:00 | 2024-09-16 | 00:00 | 3 | 15 | 18 |
| 49 | 6 | 2024-09-15 14:00:00+00:00 | 2024-09-16 | 00:00 | 1 | 5 | 6 |
| 52 | 14 | 2024-09-15 14:00:00+00:00 | 2024-09-16 | 00:00 | 3 | 1 | 4 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 216551 | 79 | 2024-09-18 02:00:00+00:00 | 2024-09-18 | 12:00 | 2 | 11 | 13 |
| 216573 | 85 | 2024-09-18 02:00:00+00:00 | 2024-09-18 | 12:00 | 1 | 0 | 1 |
| 216621 | 137 | 2024-09-18 02:00:00+00:00 | 2024-09-18 | 12:00 | 2 | 8 | 10 |
| 216634 | 141 | 2024-09-18 02:00:00+00:00 | 2024-09-18 | 12:00 | 20 | 7 | 27 |
| 216639 | 142 | 2024-09-18 02:00:00+00:00 | 2024-09-18 | 12:00 | 37 | 34 | 71 |
4974 rows × 7 columns
In this code, I convert the sensing_datetime column to a datetime format to facilitate time-based analysis. I then determine the earliest and latest timestamps in the dataset, which helps establish the timeframe covered by the pedestrian counting data. This step is crucial for understanding the temporal scope of the data, allowing me to analyze pedestrian trends within a defined period.
# Convert 'timestamp' column to datetime
pedestrian_count_min['sensing_datetimep'] = pd.to_datetime(pedestrian_count_min['sensing_datetime'])
earliest_timestamp = pedestrian_count_min['sensing_datetime'].min()
latest_timestamp = pedestrian_count_min['sensing_datetime'].max()
print("Earliest Timestamp:", earliest_timestamp)
print("Latest Timestamp:", latest_timestamp)
Earliest Timestamp: 2024-09-14 13:55:00+00:00 Latest Timestamp: 2024-09-18 02:55:00+00:00
Pedestrian Couting System Locations dataset¶
In this code snippet, I access pedestrian sensor location data from an open data API to enhance my analysis of pedestrian traffic patterns. After successfully fetching the data, I convert it into a pandas DataFrame to facilitate further analysis, such as mapping sensor locations using GIS technology. This process is critical for accurately determining the distribution of pedestrian traffic and planning safety measures effectively in my data-driven urban planning project.
load_dotenv()
api_key = os.environ.get("API_KEY_MOP")
base_url = 'https://melbournetestbed.opendatasoft.com/api/explore/v2.1/catalog/datasets/'
dataset_id = 'pedestrian-counting-system-sensor-locations'
apikey = api_key
dataset_id = dataset_id
format = 'csv'
params = {
'select': '*',
'limit': -1, # all records
'lang': 'en',
'timezone': 'UTC',
'api_key': apikey
}
url = f'{base_url}{dataset_id}/exports/{format}'
# GET request
response = requests.get(url, params=params)
if response.status_code == 200:
# StringIO to read the CSV data
url_content = response.content.decode('utf-8')
pedestrian_sensor_locations = pd.read_csv(StringIO(url_content), delimiter=';')
print(pedestrian_sensor_locations.sample(10, random_state=999)) # Test
else:
print(f'Request failed with status code {response.status_code}')
location_id sensor_description \
55 44 Tin Alley-Swanston St (West)
41 138 COM Pole 1671 - Enterprize Park, Queens Bridge
24 78 Harbour Esplanade (West) - Bike Path
59 50 Faraday St-Lygon St (West)
37 118 114 Flinders Street Car Park Crossing
82 150 narrm ngarrgu Library - Level 1 Main Stairs B
28 85 Macaulay Rd (North)
107 54 Lincoln-Swanston (West)
64 67 Flinders Ln -Degraves St (South)
29 90 Boyd Community Hub- Library
sensor_name installation_date note \
55 UM3_T 2015-04-15 Pushbox Upgrade, 30/06/2023
41 EntPark1671_T 2023-11-20 NaN
24 HarEsB_T 2021-03-30 NaN
59 Lyg309_T 2017-11-30 Pushbox Upgrade, 25/07/2023
37 Fli114C_T 2022-12-06 NaN
82 narrLibL1MB_T 2023-10-23 NaN
28 488Mac_T 2021-12-21 NaN
107 Swa607_T 2018-06-26 NaN
64 FLDegS_T 2020-06-03 NaN
29 BoCoL_T 2015-08-11 NaN
location_type status direction_1 direction_2 latitude longitude \
55 Outdoor A North South -37.796987 144.964413
41 Outdoor A East West -37.819965 144.959815
24 Outdoor A North South -37.814716 144.944651
59 Outdoor A North South -37.798082 144.967210
37 Outdoor A North South -37.816328 144.970905
82 Indoor A NaN NaN -37.807912 144.958201
28 Outdoor A East West -37.794324 144.929734
107 Outdoor A North South -37.804024 144.963084
64 Outdoor A East West -37.816888 144.965626
29 Indoor A NaN NaN -37.825562 144.961154
location
55 -37.79698741, 144.96441306
41 -37.81996544, 144.95981454
24 -37.81471642, 144.9446508
59 -37.79808192, 144.96721013
37 -37.81632783, 144.97090512
82 -37.80791198, 144.95820087
28 -37.79432415, 144.92973378
107 -37.804024, 144.96308399
64 -37.81688755, 144.96562569
29 -37.82556207, 144.96115421
Street Names Dataset¶
In this snippet, I utilize API data to fetch a list of street names in CSV format, which I load into a pandas DataFrame. This information is essential for associating geographic and traffic data with specific street locations, allowing for a more granular analysis of pedestrian safety across different areas. This method ensures that my urban planning project effectively utilizes real-time data for decision-making and planning interventions.
load_dotenv()
api_key = os.environ.get("API_KEY_MOP")
base_url = 'https://melbournetestbed.opendatasoft.com/api/explore/v2.1/catalog/datasets/'
dataset_id = 'street-names'
apikey = api_key
dataset_id = dataset_id
format = 'csv'
params = {
'select': '*',
'limit': -1, # all records
'lang': 'en',
'timezone': 'UTC',
'api_key': apikey
}
url = f'{base_url}{dataset_id}/exports/{format}'
# GET request
response = requests.get(url, params=params)
if response.status_code == 200:
# StringIO to read the CSV data
url_content = response.content.decode('utf-8')
street_names = pd.read_csv(StringIO(url_content), delimiter=';')
print(street_names.sample(10, random_state=999)) # Test
else:
print(f'Request failed with status code {response.status_code}')
geo_point_2d \
680 -37.8063424022535, 144.944400840844
782 -37.7922130265025, 144.939455640246
2121 -37.80603335366346, 144.94330724359577
1073 -37.84057585833539, 145.00460081586272
1278 -37.79342942689862, 144.91844441272363
2211 -37.794507128057504, 144.920538630693
303 -37.795323682507004, 144.94622660323301
896 -37.809746212595, 144.946085238719
2552 -37.821777475467, 144.935491325427
809 -37.79263026699863, 144.91941442142388
geo_shape mccid_gis \
680 {"coordinates": [[144.944487580612, -37.806378... 310
782 {"coordinates": [[144.942059049471, -37.792498... 35
2121 {"coordinates": [[144.943140438154, -37.805906... 865
1073 {"coordinates": [[144.995482513629, -37.839488... 368
1278 {"coordinates": [[144.918035092016, -37.792981... 30
2211 {"coordinates": [[144.920598774622, -37.794572... 436
303 {"coordinates": [[144.946172417995, -37.795317... 187
896 {"coordinates": [[144.946177050706, -37.809831... 137
2552 {"coordinates": [[144.935097756379, -37.821686... 1274
809 {"coordinates": [[144.922850575992, -37.791069... 73
maplabel name mccid_str xdate
680 PL5141 PL5141 Street_Label_2000 20210923
782 Sutton Street SUTTON STREET Street_Label_15000 20210923
2121 CL1412 CL1412 Street_Label_2000 20210923
1073 NaN TOORAK RD STREET_NAME_EXT_10000_Label 20160122
1278 Willis Street WILLIS STREET Street_Label_10000 20210923
2211 Matthews Mews MATTHEWS MEWS Street_Label_2000 20210923
303 PL5200 PL5200 Street_Label_1000 20210923
896 PL5106 PL5106 Street_Label_1000 20210923
2552 Catalina Place CATALINA PLACE Street_Label_2000 20210923
809 Stockmans Way STOCKMANS WAY Street_Label_10000 20210923
street_names
| geo_point_2d | geo_shape | mccid_gis | maplabel | name | mccid_str | xdate | |
|---|---|---|---|---|---|---|---|
| 0 | -37.83011414410377, 144.95268063216 | {"coordinates": [[144.95328861584, -37.8298049... | 39 | NaN | BUCKHURST LA | STREET_NAME_EXT_5000_Label | 20160122 |
| 1 | -37.774964845363, 144.938994281833 | {"coordinates": [[144.938916491966, -37.775396... | 65 | NaN | GIBSON AV | STREET_NAME_EXT_10000_Label | 20160122 |
| 2 | -37.833624678099, 144.9483213738935 | {"coordinates": [[144.948253784366, -37.833456... | 63 | NaN | BARKLY AV | STREET_NAME_EXT_5000_Label | 20160122 |
| 3 | -37.800287679660904, 144.9549082867173 | {"coordinates": [[144.954700495049, -37.800111... | 21 | Wreckyn Place | WRECKYN PLACE | Street_Label_2000 | 20210923 |
| 4 | -37.7821603522835, 144.9074255254285 | {"coordinates": [[144.907011990752, -37.781713... | 31 | NaN | CHAUVEL ST | STREET_NAME_EXT_10000_Label | 20160122 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 2871 | -37.796458014327, 144.949704329739 | {"coordinates": [[144.949513460746, -37.796304... | 545 | Chapman Lane | CHAPMAN LANE | Street_Label_2000 | 20210923 |
| 2872 | -37.80929597646273, 144.95041403139217 | {"coordinates": [[144.95036270025, -37.8092539... | 1292 | CL1115 | CL1115 | Street_Label_2000 | 20210923 |
| 2873 | -37.822414318054754, 144.9373137961169 | {"coordinates": [[144.940627978509, -37.823345... | 4 | South Wharf Drive | SOUTH WHARF DRIVE | Street_Label_10000 | 20210923 |
| 2874 | -37.79208682782973, 144.92245047769802 | {"coordinates": [[144.92211162733, -37.7922716... | 671 | Gardner Lane | GARDNER LANE | Street_Label_2000 | 20210923 |
| 2875 | -37.795239920115, 144.967131047208 | {"coordinates": [[144.967699834798, -37.795302... | 651 | Waterloo Street | WATERLOO STREET | Street_Label_2000 | 20210923 |
2876 rows × 7 columns
Merge pedestrian counts and locations¶
In this code snippet, I merge two datasets: pedestrian count data and sensor location data, using a common key (locationid from the pedestrian count data and location_id from the sensor location data). This merged DataFrame enables me to analyze pedestrian counts in the context of their specific locations, which is crucial for spatial analysis in my project.
pedestrian_merged_data = pd.merge(hourly_df, pedestrian_sensor_locations, left_on='location_id', right_on='location_id', how='inner')
# pedestrian_merged_data.sort_values(by='timestamp',ascending=False)
Find the earliest and latest timestamps¶
In this code, I convert the 'timestamp' column of our pedestrian data to a datetime format for easier analysis. Then, I extract and display the earliest and latest timestamps to assess the temporal range of the data, ensuring that our analysis is timely and relevant for current urban planning needs.
# Convert 'timestamp' column to datetime
pedestrian_merged_data['sensing_datetime'] = pd.to_datetime(pedestrian_merged_data['sensing_datetime'])
earliest_timestamp = pedestrian_merged_data['sensing_datetime'].min()
latest_timestamp = pedestrian_merged_data['sensing_datetime'].max()
print("Earliest Timestamp:", earliest_timestamp)
print("Latest Timestamp:", latest_timestamp)
Earliest Timestamp: 2024-09-14 14:00:00+00:00 Latest Timestamp: 2024-09-18 02:00:00+00:00
Filter data by date (Last Month)¶
This code filters the pedestrian data to include only records from the last month. By setting the end_date to today and the start_date to one month prior, I create a time range for filtering. I then extract the date from the sensing_datetime column and filter the dataset (pedestrian_merged_data) to keep only the entries within this one month. The resulting filtered_data_last_month DataFrame provides a focused view of pedestrian activity over the past month, which is useful for recent trend analysis and short-term planning.
# Define the end date as today
end_date = pd.Timestamp.today().date()
# Define the start date as one month before the end date
start_date = (pd.Timestamp.today() - pd.DateOffset(months=1)).date()
# Extract date from 'timestamp' column
pedestrian_merged_data['date_only'] = pedestrian_merged_data['sensing_datetime'].dt.date
# Filter the combined data DataFrame by the last month
filtered_data_last_month = pedestrian_merged_data[
(pedestrian_merged_data['date_only'] >= start_date) &
(pedestrian_merged_data['date_only'] <= end_date)
]
# Display the filtered data for the last month
print("Filtered data for the last month:")
print(filtered_data_last_month.head())
Filtered data for the last month:
location_id sensing_datetime sensing_date sensing_time \
0 107 2024-09-15 14:00:00+00:00 2024-09-16 00:00
1 107 2024-09-15 12:00:00+00:00 2024-09-15 22:00
2 107 2024-09-15 05:00:00+00:00 2024-09-15 15:00
3 107 2024-09-15 01:00:00+00:00 2024-09-15 11:00
4 107 2024-09-15 00:00:00+00:00 2024-09-15 10:00
direction_1_x direction_2_x total_of_directions \
0 2 2 4
1 0 1 1
2 9 11 20
3 17 4 21
4 9 10 19
sensor_description sensor_name installation_date note location_type \
0 Flagstaff station (East) 280Will_T 2022-10-08 NaN Outdoor
1 Flagstaff station (East) 280Will_T 2022-10-08 NaN Outdoor
2 Flagstaff station (East) 280Will_T 2022-10-08 NaN Outdoor
3 Flagstaff station (East) 280Will_T 2022-10-08 NaN Outdoor
4 Flagstaff station (East) 280Will_T 2022-10-08 NaN Outdoor
status direction_1_y direction_2_y latitude longitude \
0 A North South -37.812463 144.956902
1 A North South -37.812463 144.956902
2 A North South -37.812463 144.956902
3 A North South -37.812463 144.956902
4 A North South -37.812463 144.956902
location date_only
0 -37.81246271, 144.95690188 2024-09-15
1 -37.81246271, 144.95690188 2024-09-15
2 -37.81246271, 144.95690188 2024-09-15
3 -37.81246271, 144.95690188 2024-09-15
4 -37.81246271, 144.95690188 2024-09-15
filtered_data_last_month.rename(columns={'sensing_datetime': 'timestamp'}, inplace=True)
Request climate data through API using latitude and longitude data¶
<!--This code extracts the first values of latitude and longitude from the pedestrian sensor locations dataset, storing them as variables. This step is typically used to set a reference point or initial map focus when visualizing geographic data related to pedestrian movement.
In this code snippet, I extract the latitude and longitude coordinates of a specific pedestrian sensor location by accessing the first row's values in the pedestrian_sensor_locations DataFrame. These coordinates (pedestrian_latitude and pedestrian_longitude) are crucial for mapping the sensor's location and integrating it with other spatial data, such as climate data. This step enables geographic analysis and visualization, allowing for a better understanding of pedestrian activity at specific locations.
pedestrian_latitude = pedestrian_sensor_locations['latitude'].values[0]
pedestrian_longitude = pedestrian_sensor_locations['longitude'].values[0]
<!--This code snippet is designed to fetch and process climate data for multiple pedestrian sensor locations using the Open-Meteo API. It starts by setting up a cached and retry-enabled session to handle API requests efficiently. A function get_climate_data is defined to retrieve hourly climate data, including temperature, humidity, precipitation, and UV index, for given latitude and longitude coordinates. This data is then organized into a DataFrame.
For each pedestrian sensor location, the function is called to gather climate data, which is then appended to a list. Finally, all individual DataFrames are concatenated into one comprehensive DataFrame, which is adjusted to rename the 'date' column to 'timestamp' for consistency with other data elements in the project. This integration enables a detailed analysis of how weather conditions correlate with pedestrian movement patterns, enhancing the project's insights into pedestrian safety under various environmental conditions.
Open-Meteo API¶
In this code chunk, I set up a client to retrieve climate data using the Open-Meteo API, ensuring robust data collection with caching and retry mechanisms in case of errors. The get_climate_data function fetches hourly climate variables—such as temperature, humidity, and precipitation—based on the latitude and longitude of pedestrian sensor locations. The data is processed and stored in a pandas DataFrame, where each row represents hourly climate conditions for a specific location.
The process is repeated for all sensor locations, and the resulting data is combined into a single DataFrame, climate_data_combined. This dataset is essential for integrating weather conditions with pedestrian and footpath data, allowing for a comprehensive analysis of how microclimate factors might impact pedestrian activity and safety in different parts of Melbourne.
# Setup the Open-Meteo API client with cache and retry on error
cache_session = requests_cache.CachedSession('.cache', expire_after=3600)
retry_session = retry(cache_session, retries=5, backoff_factor=0.2)
openmeteo = openmeteo_requests.Client(session=retry_session)
def get_climate_data(latitude, longitude):
url = "https://api.open-meteo.com/v1/forecast"
params = {
"latitude": pedestrian_latitude,
"longitude": pedestrian_longitude,
"current": "relative_humidity_2m",
"hourly": ["temperature_2m", "relative_humidity_2m", "precipitation", "rain", "showers", "weather_code", "uv_index"],
"past_days": 92
}
responses = openmeteo.weather_api(url, params=params)
# Process first location. Add a for-loop for multiple locations or weather models
response = responses[0]
# Process hourly data
hourly = response.Hourly()
hourly_temperature_2m = hourly.Variables(0).ValuesAsNumpy()
hourly_relative_humidity_2m = hourly.Variables(1).ValuesAsNumpy()
hourly_precipitation = hourly.Variables(2).ValuesAsNumpy()
hourly_rain = hourly.Variables(3).ValuesAsNumpy()
hourly_showers = hourly.Variables(4).ValuesAsNumpy()
hourly_weather_code = hourly.Variables(5).ValuesAsNumpy()
hourly_uv_index = hourly.Variables(6).ValuesAsNumpy()
hourly_data = {
"latitude": latitude,
"longitude": longitude,
"date": pd.date_range(
start=pd.to_datetime(hourly.Time(), unit="s", utc=True),
end=pd.to_datetime(hourly.TimeEnd(), unit="s", utc=True),
freq=pd.Timedelta(seconds=hourly.Interval()),
inclusive="left"
),
"temperature_2m": hourly_temperature_2m,
"relative_humidity_2m": hourly_relative_humidity_2m,
"precipitation": hourly_precipitation,
"rain": hourly_rain,
"showers": hourly_showers,
"weather_code": hourly_weather_code,
"uv_index": hourly_uv_index
}
hourly_dataframe = pd.DataFrame(data=hourly_data)
return hourly_dataframe
# Initialize an empty list to store all climate dataframes
all_climate_data = []
# Iterate over each location and retrieve climate data
for index, row in pedestrian_sensor_locations.iterrows():
latitude = row['latitude']
longitude = row['longitude']
climate_data = get_climate_data(latitude, longitude)
all_climate_data.append(climate_data)
# Concatenate all climate dataframes into a single dataframe
climate_data_combined = pd.concat(all_climate_data, ignore_index=True)
# Print the combined climate data
climate_data_combined = climate_data_combined.rename(columns={'date': 'timestamp'})
climate_data_combined
| latitude | longitude | timestamp | temperature_2m | relative_humidity_2m | precipitation | rain | showers | weather_code | uv_index | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -37.813494 | 144.965153 | 2024-06-19 00:00:00+00:00 | 5.567000 | 73.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.25 |
| 1 | -37.813494 | 144.965153 | 2024-06-19 01:00:00+00:00 | 8.017000 | 65.0 | 0.0 | 0.0 | 0.0 | 1.0 | 2.10 |
| 2 | -37.813494 | 144.965153 | 2024-06-19 02:00:00+00:00 | 10.066999 | 59.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.55 |
| 3 | -37.813494 | 144.965153 | 2024-06-19 03:00:00+00:00 | 11.266999 | 56.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.45 |
| 4 | -37.813494 | 144.965153 | 2024-06-19 04:00:00+00:00 | 12.016999 | 48.0 | 0.0 | 0.0 | 0.0 | 1.0 | 2.45 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 316003 | -37.817724 | 144.950255 | 2024-09-25 19:00:00+00:00 | 4.467000 | 85.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.00 |
| 316004 | -37.817724 | 144.950255 | 2024-09-25 20:00:00+00:00 | 4.067000 | 84.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.00 |
| 316005 | -37.817724 | 144.950255 | 2024-09-25 21:00:00+00:00 | 4.417000 | 81.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.05 |
| 316006 | -37.817724 | 144.950255 | 2024-09-25 22:00:00+00:00 | 6.267000 | 74.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.75 |
| 316007 | -37.817724 | 144.950255 | 2024-09-25 23:00:00+00:00 | 8.917000 | 66.0 | 0.0 | 0.0 | 0.0 | 2.0 | 1.70 |
316008 rows × 10 columns
In this code chunk, I prepare and filter the climate data to focus on the most recent month. First, I convert the timestamp column to a timezone-aware datetime format (UTC). Then, I define the analysis period by setting the end_date to the current date and calculating the start_date as one month prior. I filter the climate data (climate_data_combined) to include only records within this one-month period, resulting in filtered_climate_data_last_month. This filtered dataset provides a focused view of recent climate conditions, which is crucial for analyzing how current weather patterns might influence pedestrian behavior and safety.
# Convert 'timestamp' to datetime with timezone
climate_data_combined['timestamp'] = pd.to_datetime(climate_data_combined['timestamp']).dt.tz_convert('UTC')
# Define the end date as today
end_date = pd.Timestamp.now(tz='UTC')
# Define the start date as one month before the end date
start_date = end_date - pd.DateOffset(months=1)
# Filter the DataFrame for the last month
filtered_climate_data_last_month = climate_data_combined[
(climate_data_combined['timestamp'] >= start_date) &
(climate_data_combined['timestamp'] <= end_date)
]
# Display the filtered data for the last month
filtered_climate_data_last_month.head()
| latitude | longitude | timestamp | temperature_2m | relative_humidity_2m | precipitation | rain | showers | weather_code | uv_index | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1465 | -37.813494 | 144.965153 | 2024-08-19 01:00:00+00:00 | 13.367000 | 73.0 | 0.0 | 0.0 | 0.0 | 1.0 | 3.50 |
| 1466 | -37.813494 | 144.965153 | 2024-08-19 02:00:00+00:00 | 16.317001 | 63.0 | 0.0 | 0.0 | 0.0 | 1.0 | 4.25 |
| 1467 | -37.813494 | 144.965153 | 2024-08-19 03:00:00+00:00 | 17.867001 | 58.0 | 0.0 | 0.0 | 0.0 | 2.0 | 4.50 |
| 1468 | -37.813494 | 144.965153 | 2024-08-19 04:00:00+00:00 | 18.467001 | 54.0 | 0.0 | 0.0 | 0.0 | 2.0 | 3.05 |
| 1469 | -37.813494 | 144.965153 | 2024-08-19 05:00:00+00:00 | 18.067001 | 57.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.30 |
Check the time range¶
# Convert 'timestamp' column to datetime if needed
filtered_climate_data_last_month['timestamp'] = pd.to_datetime(filtered_climate_data_last_month['timestamp'])
# Find the earliest and latest timestamps
earliest_timestamp = filtered_climate_data_last_month['timestamp'].min()
latest_timestamp = filtered_climate_data_last_month['timestamp'].max()
print("Earliest Timestamp:", earliest_timestamp)
print("Latest Timestamp:", latest_timestamp)
Earliest Timestamp: 2024-08-19 01:00:00+00:00 Latest Timestamp: 2024-09-19 00:00:00+00:00
Merge pedestrian dataset with climate dataset on timestamp and location data¶
merged_data = pd.merge(filtered_climate_data_last_month, filtered_data_last_month, on=['timestamp', 'latitude', 'longitude'])
merged_data
| latitude | longitude | timestamp | temperature_2m | relative_humidity_2m | precipitation | rain | showers | weather_code | uv_index | ... | sensor_description | sensor_name | installation_date | note | location_type | status | direction_1_y | direction_2_y | location | date_only | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -37.813494 | 144.965153 | 2024-09-14 14:00:00+00:00 | 8.367000 | 75.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.00 | ... | Bourke Street Mall (North) | Bou292_T | 2009-03-24 | NaN | Outdoor | A | East | West | -37.81349441, 144.96515323 | 2024-09-14 |
| 1 | -37.813494 | 144.965153 | 2024-09-14 21:00:00+00:00 | 7.567000 | 78.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.05 | ... | Bourke Street Mall (North) | Bou292_T | 2009-03-24 | NaN | Outdoor | A | East | West | -37.81349441, 144.96515323 | 2024-09-14 |
| 2 | -37.813494 | 144.965153 | 2024-09-15 00:00:00+00:00 | 11.117000 | 56.0 | 0.0 | 0.0 | 0.0 | 2.0 | 3.55 | ... | Bourke Street Mall (North) | Bou292_T | 2009-03-24 | NaN | Outdoor | A | East | West | -37.81349441, 144.96515323 | 2024-09-15 |
| 3 | -37.813494 | 144.965153 | 2024-09-15 01:00:00+00:00 | 11.917000 | 50.0 | 0.0 | 0.0 | 0.0 | 2.0 | 4.75 | ... | Bourke Street Mall (North) | Bou292_T | 2009-03-24 | NaN | Outdoor | A | East | West | -37.81349441, 144.96515323 | 2024-09-15 |
| 4 | -37.813494 | 144.965153 | 2024-09-15 02:00:00+00:00 | 12.316999 | 48.0 | 0.0 | 0.0 | 0.0 | 2.0 | 5.55 | ... | Bourke Street Mall (North) | Bou292_T | 2009-03-24 | NaN | Outdoor | A | East | West | -37.81349441, 144.96515323 | 2024-09-15 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 6049 | -37.819973 | 144.958349 | 2024-09-17 22:00:00+00:00 | 8.667000 | 62.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.00 | ... | Awning of Nationwide Parking 474 Flinders Street | 474Fl_T | 2023-11-10 | NaN | Outdoor | A | East | West | -37.81997273, 144.95834911 | 2024-09-17 |
| 6050 | -37.819973 | 144.958349 | 2024-09-17 23:00:00+00:00 | 11.066999 | 55.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.40 | ... | Awning of Nationwide Parking 474 Flinders Street | 474Fl_T | 2023-11-10 | NaN | Outdoor | A | East | West | -37.81997273, 144.95834911 | 2024-09-17 |
| 6051 | -37.819973 | 144.958349 | 2024-09-18 00:00:00+00:00 | 15.117000 | 40.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.95 | ... | Awning of Nationwide Parking 474 Flinders Street | 474Fl_T | 2023-11-10 | NaN | Outdoor | A | East | West | -37.81997273, 144.95834911 | 2024-09-18 |
| 6052 | -37.819973 | 144.958349 | 2024-09-18 01:00:00+00:00 | 17.767000 | 29.0 | 0.0 | 0.0 | 0.0 | 0.0 | 5.25 | ... | Awning of Nationwide Parking 474 Flinders Street | 474Fl_T | 2023-11-10 | NaN | Outdoor | A | East | West | -37.81997273, 144.95834911 | 2024-09-18 |
| 6053 | -37.819973 | 144.958349 | 2024-09-18 02:00:00+00:00 | 18.817001 | 26.0 | 0.0 | 0.0 | 0.0 | 0.0 | 6.05 | ... | Awning of Nationwide Parking 474 Flinders Street | 474Fl_T | 2023-11-10 | NaN | Outdoor | A | East | West | -37.81997273, 144.95834911 | 2024-09-18 |
6054 rows × 26 columns
Reindexing the dataframe¶
#merged_data = merged_data.reindex(columns=['latitude', 'longitude','timestamp','location_id', 'direction_1_x', 'direction_2_x', 'total_of_directions', 'direction_1_y', 'direction_2_y','temperature_2m','relative_humidity_2m', 'precipitation', 'rain', 'showers','weather_code', 'uv_index', ])
merged_data
| latitude | longitude | timestamp | temperature_2m | relative_humidity_2m | precipitation | rain | showers | weather_code | uv_index | ... | sensor_description | sensor_name | installation_date | note | location_type | status | direction_1_y | direction_2_y | location | date_only | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -37.813494 | 144.965153 | 2024-09-14 14:00:00+00:00 | 8.367000 | 75.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.00 | ... | Bourke Street Mall (North) | Bou292_T | 2009-03-24 | NaN | Outdoor | A | East | West | -37.81349441, 144.96515323 | 2024-09-14 |
| 1 | -37.813494 | 144.965153 | 2024-09-14 21:00:00+00:00 | 7.567000 | 78.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0.05 | ... | Bourke Street Mall (North) | Bou292_T | 2009-03-24 | NaN | Outdoor | A | East | West | -37.81349441, 144.96515323 | 2024-09-14 |
| 2 | -37.813494 | 144.965153 | 2024-09-15 00:00:00+00:00 | 11.117000 | 56.0 | 0.0 | 0.0 | 0.0 | 2.0 | 3.55 | ... | Bourke Street Mall (North) | Bou292_T | 2009-03-24 | NaN | Outdoor | A | East | West | -37.81349441, 144.96515323 | 2024-09-15 |
| 3 | -37.813494 | 144.965153 | 2024-09-15 01:00:00+00:00 | 11.917000 | 50.0 | 0.0 | 0.0 | 0.0 | 2.0 | 4.75 | ... | Bourke Street Mall (North) | Bou292_T | 2009-03-24 | NaN | Outdoor | A | East | West | -37.81349441, 144.96515323 | 2024-09-15 |
| 4 | -37.813494 | 144.965153 | 2024-09-15 02:00:00+00:00 | 12.316999 | 48.0 | 0.0 | 0.0 | 0.0 | 2.0 | 5.55 | ... | Bourke Street Mall (North) | Bou292_T | 2009-03-24 | NaN | Outdoor | A | East | West | -37.81349441, 144.96515323 | 2024-09-15 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 6049 | -37.819973 | 144.958349 | 2024-09-17 22:00:00+00:00 | 8.667000 | 62.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.00 | ... | Awning of Nationwide Parking 474 Flinders Street | 474Fl_T | 2023-11-10 | NaN | Outdoor | A | East | West | -37.81997273, 144.95834911 | 2024-09-17 |
| 6050 | -37.819973 | 144.958349 | 2024-09-17 23:00:00+00:00 | 11.066999 | 55.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.40 | ... | Awning of Nationwide Parking 474 Flinders Street | 474Fl_T | 2023-11-10 | NaN | Outdoor | A | East | West | -37.81997273, 144.95834911 | 2024-09-17 |
| 6051 | -37.819973 | 144.958349 | 2024-09-18 00:00:00+00:00 | 15.117000 | 40.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.95 | ... | Awning of Nationwide Parking 474 Flinders Street | 474Fl_T | 2023-11-10 | NaN | Outdoor | A | East | West | -37.81997273, 144.95834911 | 2024-09-18 |
| 6052 | -37.819973 | 144.958349 | 2024-09-18 01:00:00+00:00 | 17.767000 | 29.0 | 0.0 | 0.0 | 0.0 | 0.0 | 5.25 | ... | Awning of Nationwide Parking 474 Flinders Street | 474Fl_T | 2023-11-10 | NaN | Outdoor | A | East | West | -37.81997273, 144.95834911 | 2024-09-18 |
| 6053 | -37.819973 | 144.958349 | 2024-09-18 02:00:00+00:00 | 18.817001 | 26.0 | 0.0 | 0.0 | 0.0 | 0.0 | 6.05 | ... | Awning of Nationwide Parking 474 Flinders Street | 474Fl_T | 2023-11-10 | NaN | Outdoor | A | East | West | -37.81997273, 144.95834911 | 2024-09-18 |
6054 rows × 26 columns
Check the time range¶
# Find the earliest and latest timestamps
earliest_timestamp = merged_data['timestamp'].min()
latest_timestamp = merged_data['timestamp'].max()
print("Earliest Timestamp:", earliest_timestamp)
print("Latest Timestamp:", latest_timestamp)
Earliest Timestamp: 2024-09-14 14:00:00+00:00 Latest Timestamp: 2024-09-18 02:00:00+00:00
Create a base map centered around Melbourne¶
<!--This code snippet creates an interactive map centered on Melbourne using Folium, a Python library ideal for geographic visualizations. The map is initially set to a specific zoom level to provide a detailed view of the city. It then processes the DataFrame for the week labeled 'January_4' from your merged data collections, filtering out duplicate entries for latitude and longitude to ensure each location is marked uniquely on the map.
Markers are added to the map for each unique coordinate, pinpointing the exact locations of pedestrian sensors. This visualization helps in understanding the geographic distribution of pedestrian traffic and can be crucial for identifying areas that may require additional safety measures. Although the code to save the map as an HTML file is commented out, executing that line would allow you to share or view the map independently of the Python environment. This map offers a powerful tool for both presentation and further analysis of pedestrian safety and traffic patterns in Melbourne.
In this code, I create an interactive map centered on Melbourne using Folium. The code filters out duplicate latitude and longitude coordinates from the merged_data dataset to ensure each location is represented only once on the map. I then add markers for each unique location, visually indicating the positions of interest across the city. This map provides a clear geographic overview of the key locations involved in the analysis, which could include pedestrian sensors, footpaths, or climate data points. It serves as a foundational tool for exploring spatial patterns and relationships in the data.
melbourne_map = folium.Map(location=[-37.8136, 144.9631], zoom_start=14)
# Filter out duplicate latitude and longitude coordinates
unique_coordinates = merged_data[['latitude', 'longitude']].drop_duplicates().values.tolist()
# Add unique coordinates as markers on the map
for lat, lon in unique_coordinates:
folium.Marker(location=[lat, lon]).add_to(melbourne_map)
melbourne_map
Total of Directions Heat Map for selected date¶
<!--This code enables interactive visualization of pedestrian traffic data on a Melbourne map. Users can select a date, and a heatmap reflecting pedestrian traffic for that day is dynamically generated and displayed. The setup uses a date picker widget for easy date selection and a function to update the map accordingly, enhancing the analysis and planning of urban pedestrian safety measures.
# Function to update the map based on the selected date
def update_map(selected_date):
selected_day_df = merged_data[merged_data['timestamp'].dt.date == selected_date]
pedestrian_data = selected_day_df[['latitude', 'longitude', 'total_of_directions']].values.tolist()
# Create base map centered around Melbourne
melbourne_map = folium.Map(location=[-37.8136, 144.9631], zoom_start=16)
# Add heatmap layer using pedestrian data
HeatMap(pedestrian_data).add_to(melbourne_map)
# Save the map as HTML
# melbourne_map.save("melbourne_heatmap.html")
display(melbourne_map)
# Create a widget to select the date
date_picker = widgets.DatePicker(description='Select Date', disabled=False)
# Display the time range
print("Pick a date last 30 days:")
# Display the widget and the interactive map
interact(update_map, selected_date=date_picker);
Pick a date last 30 days:
interactive(children=(DatePicker(value=None, description='Select Date', step=1), Output()), _dom_classes=('wid…
HeatMap with Climate Data for selected date¶
This code allows for an interactive exploration of pedestrian and climate data on a map, based on a user-selected date. The update_map function filters the merged dataset for the chosen date and visualizes the data on a Folium map centered on Melbourne. A heatmap layer is added to represent pedestrian activity, showing areas of high foot traffic. Additional heatmap layers visualize different climate variables, such as humidity, precipitation, rain, showers, and UV index, allowing users to explore the relationship between weather conditions and pedestrian activity. A date picker widget enables users to easily select a specific date, updating the map accordingly. This interactive tool is key for analyzing daily variations in pedestrian behavior and how they correlate with weather patterns, aiding in the identification of trends that can inform urban planning and safety strategies.
# Function to update the map based on the selected date
def update_map(selected_date):
# Filter data for the selected date
selected_day_df = merged_data[merged_data['timestamp'].dt.date == selected_date]
# Extract pedestrian data
pedestrian_data = selected_day_df[['latitude', 'longitude', 'total_of_directions']].values.tolist()
# Create base map centered around Melbourne
melbourne_map = folium.Map(location=[-37.8136, 144.9631], zoom_start=16)
# Add heatmap layer for pedestrian data
HeatMap(pedestrian_data, name='Pedestrian Heatmap').add_to(melbourne_map)
# Add climate data layers
climate_layers = {
'Relative Humidity 2m': 'relative_humidity_2m',
'Precipitation': 'precipitation',
'Rain': 'rain',
'Showers': 'showers',
'UV Index': 'uv_index'
}
for layer_name, layer_column in climate_layers.items():
climate_data = selected_day_df[['latitude', 'longitude', layer_column]].values.tolist()
HeatMap(climate_data, name=layer_name).add_to(melbourne_map)
# Add layer control
folium.LayerControl().add_to(melbourne_map)
# melbourne_map.save("melbourne_heatmap.html")
display(melbourne_map)
merged_data.dropna(inplace=True)
# Get unique dates from the DataFrame
unique_dates = merged_data['timestamp'].dt.date.unique()
# Create a widget to select the date
date_picker = widgets.DatePicker(description='Select Date', disabled=False)
# Display the time range
print("Time Range:")
print("Earliest Timestamp:", earliest_timestamp)
print("Latest Timestamp:", latest_timestamp)
# Display the widget and the interactive map
interact(update_map, selected_date=date_picker);
Time Range: Earliest Timestamp: 2024-09-14 14:00:00+00:00 Latest Timestamp: 2024-09-18 02:00:00+00:00
interactive(children=(DatePicker(value=None, description='Select Date', step=1), Output()), _dom_classes=('wid…
<!--This code calculates and displays summary statistics for the missing values (NaN) across all columns in a specified weekly DataFrame. It helps in assessing the extent of missing data, which is crucial for planning data cleaning and preprocessing steps effectively.
In this code, I calculate summary statistics for the distribution of missing values (NaNs) across the merged_data DataFrame.
# Calculate summary statistics of NaN values
nan_dispersion = merged_data.isnull().sum().describe()
print(nan_dispersion)
count 26.0 mean 0.0 std 0.0 min 0.0 25% 0.0 50% 0.0 75% 0.0 max 0.0 dtype: float64
Drop NaN values¶
<!--This code snippet removes rows with missing values from a specified weekly DataFrame and then calculates summary statistics for the cleaned data. These statistics provide insights into the central tendency, dispersion, and shape of the dataset’s distribution, aiding in data analysis and decision-making. The summary is then printed, offering a detailed overview of the available data’s characteristics.
# Drop NaN values
cleaned_df = merged_data.dropna()
# Summary statistics of the available data
summary = cleaned_df.describe()
# Print the summary statistics
print(summary)
latitude longitude temperature_2m relative_humidity_2m \
count 1572.000000 1572.000000 1572.000000 1572.000000
mean -37.812449 144.965619 11.542317 63.357506
std 0.006253 0.004357 2.687549 14.680360
min -37.820178 144.954527 6.817000 26.000000
25% -37.818742 144.962578 9.217000 50.750000
50% -37.813625 144.966094 11.967000 66.000000
75% -37.809993 144.968729 12.816999 75.000000
max -37.796987 144.973297 18.817001 91.000000
precipitation rain showers weather_code uv_index \
count 1572.000000 1572.000000 1572.000000 1572.00000 1572.000000
mean 0.026018 0.019275 0.006743 7.80598 1.928880
std 0.086468 0.082011 0.031798 18.38876 2.062064
min 0.000000 0.000000 0.000000 0.00000 0.000000
25% 0.000000 0.000000 0.000000 2.00000 0.000000
50% 0.000000 0.000000 0.000000 2.00000 1.350000
75% 0.000000 0.000000 0.000000 3.00000 3.550000
max 0.500000 0.500000 0.200000 80.00000 6.050000
location_id direction_1_x direction_2_x total_of_directions
count 1572.000000 1572.000000 1572.000000 1572.000000
mean 34.390585 5.531807 5.801527 11.333333
std 17.345043 12.415402 12.446143 23.398729
min 5.000000 0.000000 0.000000 1.000000
25% 23.000000 1.000000 1.000000 2.000000
50% 37.000000 2.000000 2.000000 5.000000
75% 45.000000 6.000000 6.000000 11.000000
max 123.000000 173.000000 127.000000 252.000000
<!--This code snippet processes and visualizes pedestrian traffic volume and rainfall data on an hourly basis for a specific week (week_key). It first converts timestamps to datetime format and extracts hours to group the data, calculating mean traffic volumes and rainfall for each hour. The visualization uses a dual-axis bar and line chart, with traffic volumes displayed as blue bars and rainfall represented by a red line. This setup enables a clear comparison to see how weather conditions, specifically rainfall, correlate with pedestrian traffic volumes throughout the day, providing valuable insights for urban planning and safety enhancements.
Mean Traffic Volume with weather variables for each hour¶
In this code, I analyze the relationship between hourly pedestrian traffic and weather conditions by first converting the timestamp column to a datetime format and then extracting the hour from each timestamp. The data is grouped by hour, and the mean values for pedestrian traffic (total_of_directions) and various weather variables (e.g., rain, temperature, humidity) are calculated.
The results are visualized using a combination of bar and line plots on the same graph. The bar chart represents the mean pedestrian traffic volume by hour, while the line plots overlay the weather variables, allowing for a comparative analysis. The dual-axis plot helps illustrate how weather conditions fluctuate throughout the day and how these fluctuations might correlate with changes in pedestrian traffic. This visualization is critical for identifying patterns in pedestrian behavior in response to weather conditions, providing insights that could be used in urban planning and public safety strategies.
# Convert 'timestamp' column to datetime format
merged_data['timestamp'] = pd.to_datetime(merged_data['timestamp'])
# Extract hour from timestamp
merged_data['hour'] = merged_data['timestamp'].dt.hour
# Group data by hour and calculate mean traffic volume and mean weather variables
hourly_data = merged_data.groupby('hour').agg({
'total_of_directions': 'mean',
'rain': 'mean',
'temperature_2m': 'mean',
'relative_humidity_2m': 'mean',
'precipitation': 'mean',
'showers': 'mean',
'uv_index': 'mean'
})
# Plot bar chart
fig, ax1 = plt.subplots()
# Bar for traffic volume
color = 'tab:blue'
ax1.set_xlabel('Hour of the Day')
ax1.set_ylabel('Mean Traffic Volume', color=color)
ax1.bar(hourly_data.index, hourly_data['total_of_directions'], color=color, alpha=0.7)
ax1.tick_params(axis='y', labelcolor=color)
# Create another y-axis for weather variables
ax2 = ax1.twinx()
# Line plot for rain
color = 'tab:red'
ax2.set_ylabel('Weather Variables', color=color)
ax2.plot(hourly_data.index, hourly_data['rain'], color=color, linestyle='-', marker='o', label='Rain')
ax2.plot(hourly_data.index, hourly_data['temperature_2m'], color='green', linestyle='-', marker='o', label='Temperature')
ax2.plot(hourly_data.index, hourly_data['relative_humidity_2m'], color='orange', linestyle='-', marker='o', label='Relative Humidity')
ax2.plot(hourly_data.index, hourly_data['precipitation'], color='purple', linestyle='-', marker='o', label='Precipitation')
ax2.plot(hourly_data.index, hourly_data['showers'], color='brown', linestyle='-', marker='o', label='Showers')
ax2.plot(hourly_data.index, hourly_data['uv_index'], color='blue', linestyle='-', marker='o', label='UV Index')
ax2.tick_params(axis='y', labelcolor=color)
# Add legend
fig.tight_layout()
fig.legend(loc="upper left", bbox_to_anchor=(0.15,0.88))
# Show plot
plt.title(f'Mean Traffic Volume and Weather Variables by Hour')
plt.xticks(range(24))
plt.show()
Mean hourly precipitation¶
<!--This code snippet visualizes the distribution of various weather conditions on an hourly basis for a specified week (week_key). It first extracts the hour from the timestamp for each entry and then analyzes several weather variables, including rain, precipitation, showers, UV index, temperature, and relative humidity. A 3x3 grid of subplots is set up, with each subplot dedicated to a different weather variable, displaying histograms of their mean hourly values. Each histogram is colored green and includes labels for the mean values and frequency, providing a clear visual representation of the typical hourly weather conditions. Unused subplot spaces are hidden to maintain a clean layout. This visualization is crucial for understanding the patterns and potential impacts of weather conditions on pedestrian dynamics, which can inform urban planning and safety strategies.
In this code, I analyze the distribution of various weather variables across different hours of the day. By first extracting the hour from the timestamp column, I can group the data by hour and calculate the mean value for each weather measure (such as rain, precipitation, showers, UV index, temperature, and relative humidity).
The code then creates a series of histograms—one for each weather variable—displaying the frequency distribution of their mean hourly values. These histograms are arranged in a 3x3 grid of subplots, providing a comprehensive visual summary of how these weather conditions vary throughout the day. This analysis is crucial for understanding the typical daily patterns in weather conditions and how they might impact pedestrian behavior and safety. By visualizing these patterns, I can identify trends and anomalies that could inform more targeted urban planning and public safety measures.
# Extract hour from timestamp
merged_data['hour'] = merged_data['timestamp'].dt.hour
# Define precipitation measures
precipitation_measures = ['rain', 'precipitation', 'showers','uv_index','temperature_2m','relative_humidity_2m']
# Create subplots
fig, axes = plt.subplots(3, 3, figsize=(15, 15))
# Flatten the axes array for easy iteration
axes = axes.flatten()
# Iterate over precipitation measures
for i, measure in enumerate(precipitation_measures):
# Group data by hour and calculate mean for the current precipitation measure
hourly_data = merged_data.groupby('hour')[measure].mean()
# Plot histogram of mean hourly precipitation
axes[i].hist(hourly_data, bins=20, color='green', alpha=0.7)
axes[i].set_xlabel(f'Mean Hourly {measure.capitalize()}')
axes[i].set_ylabel('Frequency')
axes[i].set_title(f'Histogram of Mean Hourly {measure.capitalize()}')
# Hide empty subplots (if any)
for ax in axes[len(precipitation_measures):]:
ax.axis('off')
plt.tight_layout()
plt.show()
Traffic Volume with other features¶
<!--This code snippet utilizes matplotlib to generate a series of scatter plots, each examining the relationship between pedestrian traffic volume and various weather conditions such as rain, UV index, and temperature. The function plot_trend_line is defined to calculate and plot a regression line for each scatter plot, providing a visual representation of the linear relationship and the strength of correlation (R² value). The plots are organized in a 3x3 grid, with each subplot corresponding to a different weather variable. Scatter plots are marked in red for visibility, with trend lines in blue and dashed style to distinguish them. Labels and titles are set for each axis and plot, respectively, enhancing readability. Unused subplot spaces are hidden to maintain a neat layout. This visualization is instrumental in identifying how different weather conditions impact traffic volume, aiding in more informed decision-making for urban planning and safety measures.
In this code, I explore the relationship between pedestrian traffic volume and various weather variables by creating scatter plots and overlaying trend lines for each variable. The function plot_trend_line calculates and plots a regression line for the data, indicating the strength and direction of the relationship between traffic volume and each weather measure.
I create a grid of subplots, each displaying a scatter plot where traffic volume is plotted against one of the weather variables (e.g., rain, precipitation, UV index). The trend line, with its corresponding R² value, is added to each plot to quantify the correlation.
This analysis provides insights into how different weather conditions impact pedestrian traffic. For instance, a negative trend in the plot of traffic volume versus precipitation might indicate that pedestrian traffic decreases as rainfall increases. By visually assessing these relationships, I can better understand how weather factors influence pedestrian behavior, which is valuable for urban planning and public safety efforts.
# Define function to calculate regression line
def plot_trend_line(x, y, ax):
if len(np.unique(x)) > 1:
slope, intercept, r_value, p_value, std_err = linregress(x, y)
line = slope * x + intercept
ax.plot(x, line, color='blue', linestyle='--', label=f'Trend Line (R²={r_value**2:.2f})')
ax.legend()
else:
ax.text(0.5, 0.5, 'Insufficient variation in data for regression', horizontalalignment='center', verticalalignment='center', transform=ax.transAxes, color='red')
# Create subplots
fig, axes = plt.subplots(3, 3, figsize=(15, 15))
# Flatten the axes array for easy iteration
axes = axes.flatten()
# Iterate over precipitation measures
for i, measure in enumerate(precipitation_measures):
# Scatter plot of traffic volume vs. the current precipitation measure
axes[i].scatter(merged_data[measure], merged_data['total_of_directions'], color='red', alpha=0.5)
axes[i].set_xlabel(measure.capitalize()) # Set x-axis label
axes[i].set_ylabel('Traffic Volume') # Set y-axis label
axes[i].set_title(f'Traffic Volume vs. {measure.capitalize()}') # Set title
# Calculate and plot trend line
plot_trend_line(merged_data[measure], merged_data['total_of_directions'], axes[i])
# Hide empty subplots (if any)
for ax in axes[len(precipitation_measures):]:
ax.axis('off')
plt.tight_layout()
plt.show()
I create a copy of the merged_data DataFrame and store it in a new variable called normalized_data. This step is a preparatory measure, allowing me to perform normalization or other transformations on the dataset without altering the original data.
normalized_data = merged_data.copy()
I normalize specific columns in the normalized_data DataFrame using the MinMaxScaler from scikit-learn. The columns selected for normalization include key variables such as pedestrian traffic volume, temperature, humidity, precipitation, and UV index. Normalization scales these features to a range between 0 and 1, which is particularly useful for preparing the data for machine learning models that require input features on a comparable scale. After applying the transformation, the DataFrame retains its structure, but the values in the specified columns are now normalized, ensuring consistency and improving the performance of any subsequent analysis or modeling tasks.
# Initialize the MinMaxScaler
scaler = MinMaxScaler()
# List of columns to normalize
columns_to_normalize = ['total_of_directions', 'temperature_2m', 'relative_humidity_2m',
'precipitation', 'rain', 'showers', 'weather_code', 'uv_index']
# Fit the scaler on the data and transform it, keeping the DataFrame structure
normalized_data[columns_to_normalize] = scaler.fit_transform(normalized_data[columns_to_normalize])
# Print the scaled DataFrame
print(normalized_data)
latitude longitude timestamp temperature_2m \
709 -37.818742 144.967877 2024-09-14 14:00:00+00:00 0.129167
710 -37.818742 144.967877 2024-09-14 15:00:00+00:00 0.104167
711 -37.818742 144.967877 2024-09-14 16:00:00+00:00 0.079167
712 -37.818742 144.967877 2024-09-14 17:00:00+00:00 0.066667
713 -37.818742 144.967877 2024-09-14 18:00:00+00:00 0.050000
... ... ... ... ...
5223 -37.808418 144.959063 2024-09-17 11:00:00+00:00 0.375000
5224 -37.808418 144.959063 2024-09-17 23:00:00+00:00 0.354167
5225 -37.808418 144.959063 2024-09-18 00:00:00+00:00 0.691667
5226 -37.808418 144.959063 2024-09-18 01:00:00+00:00 0.912500
5227 -37.808418 144.959063 2024-09-18 02:00:00+00:00 1.000000
relative_humidity_2m precipitation rain showers weather_code \
709 0.753846 0.0 0.0 0.0 0.0250
710 0.815385 0.2 0.0 0.5 0.0250
711 0.861538 0.0 0.0 0.0 0.0125
712 0.876923 0.0 0.0 0.0 0.0250
713 0.815385 0.0 0.0 0.0 0.0125
... ... ... ... ... ...
5223 0.661538 0.0 0.0 0.0 0.0250
5224 0.446154 0.0 0.0 0.0 0.0000
5225 0.215385 0.0 0.0 0.0 0.0000
5226 0.046154 0.0 0.0 0.0 0.0000
5227 0.000000 0.0 0.0 0.0 0.0000
uv_index ... sensor_name installation_date \
709 0.000000 ... PriNW_T 2009-03-26
710 0.000000 ... PriNW_T 2009-03-26
711 0.000000 ... PriNW_T 2009-03-26
712 0.000000 ... PriNW_T 2009-03-26
713 0.000000 ... PriNW_T 2009-03-26
... ... ... ... ...
5223 0.000000 ... Fra118_T 2017-11-30
5224 0.396694 ... Fra118_T 2017-11-30
5225 0.652893 ... Fra118_T 2017-11-30
5226 0.867769 ... Fra118_T 2017-11-30
5227 1.000000 ... Fra118_T 2017-11-30
note location_type status direction_1_y \
709 Replace with: 00:6e:02:01:9e:54 Outdoor A North
710 Replace with: 00:6e:02:01:9e:54 Outdoor A North
711 Replace with: 00:6e:02:01:9e:54 Outdoor A North
712 Replace with: 00:6e:02:01:9e:54 Outdoor A North
713 Replace with: 00:6e:02:01:9e:54 Outdoor A North
... ... ... ... ...
5223 Pushbox Upgrade, 20/07/2023 Outdoor A East
5224 Pushbox Upgrade, 20/07/2023 Outdoor A East
5225 Pushbox Upgrade, 20/07/2023 Outdoor A East
5226 Pushbox Upgrade, 20/07/2023 Outdoor A East
5227 Pushbox Upgrade, 20/07/2023 Outdoor A East
direction_2_y location date_only hour
709 South -37.81874249, 144.96787656 2024-09-14 14
710 South -37.81874249, 144.96787656 2024-09-14 15
711 South -37.81874249, 144.96787656 2024-09-14 16
712 South -37.81874249, 144.96787656 2024-09-14 17
713 South -37.81874249, 144.96787656 2024-09-14 18
... ... ... ... ...
5223 West -37.80841815, 144.95906316 2024-09-17 11
5224 West -37.80841815, 144.95906316 2024-09-17 23
5225 West -37.80841815, 144.95906316 2024-09-18 0
5226 West -37.80841815, 144.95906316 2024-09-18 1
5227 West -37.80841815, 144.95906316 2024-09-18 2
[1572 rows x 27 columns]
Check Unique Values¶
In this code, I examine the distinct values present in the direction_1_y and direction_2_y columns of the normalized_data DataFrame. By using the unique() function, I extract and print all unique values for each column. This step is essential for understanding the range and variety of directional data recorded in the dataset, which could represent pedestrian movement patterns or specific directions measured by sensors. Identifying these unique values helps in further analysis, such as categorizing or filtering data based on movement direction, and it provides insights into the distribution and diversity of the directional data.
unique_values = normalized_data['direction_1_y'].unique()
print(unique_values)
unique_values2 = normalized_data['direction_2_y'].unique()
print(unique_values2)
['North' 'East' 'In'] ['South' 'West' 'Out' 'InOut']
# Display the data types of all columns
print(normalized_data.dtypes)
latitude float64 longitude float64 timestamp datetime64[ns, UTC] temperature_2m float64 relative_humidity_2m float64 precipitation float64 rain float64 showers float64 weather_code float64 uv_index float64 location_id int64 sensing_date object sensing_time object direction_1_x int64 direction_2_x int64 total_of_directions float64 sensor_description object sensor_name object installation_date object note object location_type object status object direction_1_y object direction_2_y object location object date_only object hour int32 dtype: object
One Hot Encorder for direction_1y and direction_2y¶
<!--This code snippet initialises a OneHotEncoder to transform categorical data into a format that machine learning algorithms can more easily use. The encoder is set to produce a dense numpy array output. It first checks if the columns direction_1_y and direction_2_y exist in the all_weeks_combined DataFrame to prevent a KeyError during the encoding process.
If the columns are present, the encoder fits and transforms the data in these columns, converting the categorical variables into a series of binary columns, one for each category. It then retrieves the names of these new features, which reflect the original column names and their corresponding category values.
A new DataFrame, encoded_df, is created from the encoded data with these feature names as column headers. This encoded DataFrame is then concatenated with the original DataFrame, excluding the original categorical columns to avoid redundancy. The resulting DataFrame is printed to show the first few entries, allowing you to verify that the encoding was successful and the DataFrame now includes the newly encoded features.
This transformation is essential for preparing the data for machine learning models that require numerical input, ensuring that categorical attributes like directions are appropriately represented.
In this code, I use the OneHotEncoder from scikit-learn to transform the categorical direction_1_y and direction_2_y columns into a set of binary (one-hot encoded) features. This transformation is essential for converting categorical data into a format suitable for machine learning models, which typically require numerical inputs.
First, I check if the specified columns exist in the DataFrame to prevent errors. If they do, I apply the encoder to these columns, generating a new set of features that represent the presence or absence of each unique category in binary form. These new features are then added back to the normalized_data DataFrame, replacing the original categorical columns.
The resulting DataFrame now includes these one-hot encoded features, which allows me to use directional data in predictive models or further analysis without the limitations posed by non-numerical values. This step enhances the dataset's usability for various machine learning and analytical tasks.
from sklearn.preprocessing import OneHotEncoder
# Initialize the OneHotEncoder
encoder = OneHotEncoder(sparse_output=False) # sparse=False ensures output is a numpy array
# Check if columns exist to avoid KeyError
if {'direction_1_y', 'direction_2_y'}.issubset(merged_data.columns):
# Fit and transform the data
encoded_data = encoder.fit_transform(normalized_data[['direction_1_y', 'direction_2_y']])
# Get the feature names from the encoder
encoded_feature_names = encoder.get_feature_names_out(['direction_1_y', 'direction_2_y'])
# Create a DataFrame with the encoded data and the generated feature names
encoded_df = pd.DataFrame(encoded_data, columns=encoded_feature_names)
# Concatenate the encoded data back to the original DataFrame
normalized_data = pd.concat([normalized_data.drop(['direction_1_y', 'direction_2_y'], axis=1), encoded_df], axis=1)
print(normalized_data.head())
else:
print("Columns 'direction_1_y' or 'direction_2_y' are not found in the DataFrame.")
latitude longitude timestamp temperature_2m \
709 -37.818742 144.967877 2024-09-14 14:00:00+00:00 0.129167
710 -37.818742 144.967877 2024-09-14 15:00:00+00:00 0.104167
711 -37.818742 144.967877 2024-09-14 16:00:00+00:00 0.079167
712 -37.818742 144.967877 2024-09-14 17:00:00+00:00 0.066667
713 -37.818742 144.967877 2024-09-14 18:00:00+00:00 0.050000
relative_humidity_2m precipitation rain showers weather_code \
709 0.753846 0.0 0.0 0.0 0.0250
710 0.815385 0.2 0.0 0.5 0.0250
711 0.861538 0.0 0.0 0.0 0.0125
712 0.876923 0.0 0.0 0.0 0.0250
713 0.815385 0.0 0.0 0.0 0.0125
uv_index ... location date_only hour \
709 0.0 ... -37.81874249, 144.96787656 2024-09-14 14.0
710 0.0 ... -37.81874249, 144.96787656 2024-09-14 15.0
711 0.0 ... -37.81874249, 144.96787656 2024-09-14 16.0
712 0.0 ... -37.81874249, 144.96787656 2024-09-14 17.0
713 0.0 ... -37.81874249, 144.96787656 2024-09-14 18.0
direction_1_y_East direction_1_y_In direction_1_y_North \
709 1.0 0.0 0.0
710 1.0 0.0 0.0
711 1.0 0.0 0.0
712 1.0 0.0 0.0
713 1.0 0.0 0.0
direction_2_y_InOut direction_2_y_Out direction_2_y_South \
709 0.0 0.0 0.0
710 0.0 0.0 0.0
711 0.0 0.0 0.0
712 0.0 0.0 0.0
713 0.0 0.0 0.0
direction_2_y_West
709 1.0
710 1.0
711 1.0
712 1.0
713 1.0
[5 rows x 32 columns]
<!--This code snippet is designed to evaluate multicollinearity among several predictors in the all_weeks_combined DataFrame, specifically focusing on weather-related variables such as temperature, UV index, showers, rain, and relative humidity. Here's a step-by-step breakdown of the process:
Selection of Predictors: The first step involves selecting specific columns from the DataFrame that are likely to be used as predictors in a regression model.
Adding a Constant: The sm.add_constant function is used to add a constant term to the predictor variables. This is necessary for models that require an intercept term, ensuring that the regression has a baseline to work from.
Calculating Variance Inflation Factor (VIF):
- A new DataFrame
vif_datais created to store the results. - The Variance Inflation Factor is calculated for each predictor using the
variance_inflation_factorfunction from thestatsmodelslibrary. This function assesses how much the variance of an estimated regression coefficient increases if your predictors are correlated. - The VIF for each variable is computed within a list comprehension, iterating over each column's index in the
predictorsDataFrame.
Output: Finally, the VIF values along with the feature names are printed. VIF values greater than 10 typically suggest high multicollinearity, which may warrant further investigation or adjustments in the model, such as dropping variables or applying dimensionality reduction techniques.
This approach is crucial for ensuring that the regression model built on these predictors will be reliable and not unduly influenced by multicollinearity, thus maintaining the validity of statistical inferences drawn from the model.
Check correlation between weather features and total of direction¶
In this code, I examine the correlation between various weather features and pedestrian traffic volume (total_of_directions). By selecting relevant columns—such as temperature_2m, relative_humidity_2m, precipitation, showers, uv_index, and total_of_directions—I calculate the correlation matrix to quantify the relationships between these variables.
The correlation matrix provides a clear overview of how each weather variable is related to pedestrian traffic, with values ranging from -1 (strong negative correlation) to 1 (strong positive correlation). This analysis helps identify which weather factors most strongly influence pedestrian movement, offering valuable insights for urban planning and safety measures aimed at improving pedestrian experiences under varying weather conditions.
selected_columns = ['temperature_2m', 'relative_humidity_2m', 'precipitation', 'showers', 'uv_index','total_of_directions']
correlation_matrix = normalized_data[selected_columns].corr()
# Display the correlation matrix
correlation_matrix
| temperature_2m | relative_humidity_2m | precipitation | showers | uv_index | total_of_directions | |
|---|---|---|---|---|---|---|
| temperature_2m | 1.000000 | -0.702381 | 0.062232 | 0.023698 | 0.669268 | 0.139282 |
| relative_humidity_2m | -0.702381 | 1.000000 | 0.260295 | 0.222140 | -0.787364 | -0.125395 |
| precipitation | 0.062232 | 0.260295 | 1.000000 | 0.320452 | -0.067995 | 0.028305 |
| showers | 0.023698 | 0.222140 | 0.320452 | 1.000000 | -0.154605 | 0.023669 |
| uv_index | 0.669268 | -0.787364 | -0.067995 | -0.154605 | 1.000000 | 0.134851 |
| total_of_directions | 0.139282 | -0.125395 | 0.028305 | 0.023669 | 0.134851 | 1.000000 |
# plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, fmt=".2f", cmap='coolwarm',annot_kws={"size": 8})
plt.title('Correlation Matrix')
plt.show()
Check for multicollinearity¶
In this code, I calculate the Variance Inflation Factor (VIF) for each predictor variable in the dataset. VIF is a measure used to detect multicollinearity among independent variables in a regression model, which occurs when predictor variables are highly correlated with each other.
I first select the relevant weather-related predictors—such as temperature_2m, uv_index, showers, rain, and relative_humidity_2m—and add a constant term for the intercept using sm.add_constant. Then, I compute the VIF for each predictor, which is stored in a DataFrame along with the corresponding feature names.
A high VIF value (typically above 5 or 10) indicates that a variable is highly collinear and may need to be removed or combined with others to reduce multicollinearity. This step is crucial for ensuring the robustness and interpretability of regression models by identifying and addressing potential multicollinearity issues that could distort the analysis results.
predictors = normalized_data[['temperature_2m', 'uv_index', 'showers', 'rain','relative_humidity_2m']]
# Add a constant term for intercept
predictors = predictors.replace([np.inf, -np.inf], np.nan).dropna()
predictors = sm.add_constant(predictors)
# Calculating VIF for each variable
vif_data = pd.DataFrame()
vif_data["feature"] = predictors.columns
vif_data["VIF"] = [variance_inflation_factor(predictors.values, i)
for i in range(len(predictors.columns))]
print(vif_data)
feature VIF 0 const 54.326701 1 temperature_2m 2.443092 2 uv_index 2.909958 3 showers 1.164545 4 rain 1.185961 5 relative_humidity_2m 3.861182
Linear regression model containing only climate features¶
In this code, I perform a multiple linear regression analysis to explore how various weather-related predictors (e.g., temperature_2m, uv_index, showers, rain) influence pedestrian traffic volume (total_of_directions). The function perform_regression splits the dataset into training and testing sets to evaluate the model's performance on unseen data.
After adding a constant term for the intercept, I use the statsmodels library to create an Ordinary Least Squares (OLS) regression model. The function outputs a detailed summary of the model, including coefficients, R-squared, and statistical significance for each predictor, along with key model selection criteria such as the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC).
These outputs help assess the model's fit and the relative importance of each predictor variable. By analyzing the regression results, I gain insights into how weather conditions might affect pedestrian traffic, which can inform urban planning and public safety decisions.
def perform_regression(df):
X = df[['temperature_2m', 'uv_index', 'showers', 'rain']]
y = df['total_of_directions']
X = X.replace([np.inf, -np.inf], np.nan).fillna(X.mean())
y = y.replace([np.inf, -np.inf], np.nan).fillna(y.mean())
# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Adding a constant to the model for the intercept
X_train_with_const = sm.add_constant(X_train)
X_test_with_const = sm.add_constant(X_test)
# Creating an OLS model with statsmodels
model = sm.OLS(y_train, X_train_with_const).fit()
# Output the summary of the model
print(model.summary())
print(f'AIC: {model.aic}')
print(f'BIC: {model.bic}')
perform_regression(normalized_data)
OLS Regression Results
===============================================================================
Dep. Variable: total_of_directions R-squared: 0.029
Model: OLS Adj. R-squared: 0.027
Method: Least Squares F-statistic: 16.42
Date: Thu, 19 Sep 2024 Prob (F-statistic): 2.91e-13
Time: 10:53:24 Log-Likelihood: 2678.3
No. Observations: 2228 AIC: -5347.
Df Residuals: 2223 BIC: -5318.
Df Model: 4
Covariance Type: nonrobust
==================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------
const 0.0186 0.004 4.633 0.000 0.011 0.026
temperature_2m 0.0307 0.013 2.401 0.016 0.006 0.056
uv_index 0.0321 0.009 3.756 0.000 0.015 0.049
showers 0.0304 0.013 2.338 0.019 0.005 0.056
rain 0.0137 0.012 1.146 0.252 -0.010 0.037
==============================================================================
Omnibus: 2842.598 Durbin-Watson: 2.029
Prob(Omnibus): 0.000 Jarque-Bera (JB): 353702.974
Skew: 7.064 Prob(JB): 0.00
Kurtosis: 63.087 Cond. No. 11.3
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
AIC: -5346.572845726645
BIC: -5318.028547721409
One-hot encode the location_id column¶
<!--This code transforms the categorical location_id data in the all_weeks_combined DataFrame into dummy variables for use in numerical analysis and modeling. It uses pd.get_dummies to create a new DataFrame location_dummies, where each unique location ID is converted into a binary column prefixed with 'location'. These dummy variables are then appended to the original DataFrame using pd.concat, expanding it to include these new columns. The updated DataFrame's columns are then listed, allowing you to confirm that the dummy variables have been successfully integrated, which is crucial for subsequent data processing or modeling steps involving location-specific analyses.
In this code, I create dummy variables for the location_id column in the normalized_data DataFrame, which likely represents different locations where data was collected. The pd.get_dummies function converts the categorical location_id values into a series of binary columns, each representing the presence or absence of a specific location.
These dummy variables are then concatenated with the original DataFrame, resulting in merged_data_encoded. The inclusion of these binary columns allows for the incorporation of location-specific information into predictive models, ensuring that the analysis can account for how different locations might influence pedestrian traffic or interact with weather variables. The output of the code shows the new set of column names, reflecting the expanded dataset that now includes location-specific features
location_dummies = pd.get_dummies(normalized_data['location_id'], prefix='location')
merged_data_encoded = pd.concat([normalized_data, location_dummies], axis=1)
merged_data_encoded.columns
Index(['latitude', 'longitude', 'timestamp', 'temperature_2m',
'relative_humidity_2m', 'precipitation', 'rain', 'showers',
'weather_code', 'uv_index', 'location_id', 'sensing_date',
'sensing_time', 'direction_1_x', 'direction_2_x', 'total_of_directions',
'sensor_description', 'sensor_name', 'installation_date', 'note',
'location_type', 'status', 'location', 'date_only', 'hour',
'direction_1_y_East', 'direction_1_y_In', 'direction_1_y_North',
'direction_2_y_InOut', 'direction_2_y_Out', 'direction_2_y_South',
'direction_2_y_West', 'location_5.0', 'location_6.0', 'location_14.0',
'location_17.0', 'location_19.0', 'location_20.0', 'location_23.0',
'location_29.0', 'location_30.0', 'location_31.0', 'location_35.0',
'location_36.0', 'location_37.0', 'location_39.0', 'location_40.0',
'location_44.0', 'location_45.0', 'location_47.0', 'location_48.0',
'location_49.0', 'location_50.0', 'location_51.0', 'location_52.0',
'location_123.0'],
dtype='object')
Calculate the correlation matrix for the numerical columns of interest + Categorical location ID¶
<!--The code snippet prepares to analyze the correlations between various weather-related variables in the all_weeks_combined DataFrame. Initially, it attempts to drop several columns related to timestamps and geographical coordinates to focus on relevant predictors for analysis. Although the code to select specific columns like 'temperature_2m' and 'relative_humidity_2m' for correlation analysis is commented out, the intention is to examine the relationships between these environmental factors. The correlation matrix is then computed for these selected variables, aiming to reveal how closely these variables are related, which can help in understanding their combined effects on pedestrian traffic patterns. This correlation analysis is crucial for identifying potential multicollinearity or for developing more informed models that predict pedestrian behavior based on weather conditions.
selected_data = merged_data_encoded.drop([ 'timestamp','longitude','latitude','direction_1_x',
'direction_2_x', 'total_of_directions'], axis=1)
# Display columns with their data types
selected_data = selected_data.select_dtypes(include=[np.number])
#selected_columns = ['temperature_2m', 'relative_humidity_2m', 'precipitation', 'rain', 'showers', 'uv_index']
correlation_matrix3 = selected_data.corr()
#correlation_matrix = selected_data.corr()
# Display the correlation matrix
correlation_matrix3
| temperature_2m | relative_humidity_2m | precipitation | rain | showers | weather_code | uv_index | location_id | hour | direction_1_y_East | direction_1_y_In | direction_1_y_North | direction_2_y_InOut | direction_2_y_Out | direction_2_y_South | direction_2_y_West | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| temperature_2m | 1.000000 | -0.702381 | 0.062232 | 0.056426 | 0.023698 | 0.085731 | 0.669268 | 0.088999 | -0.713762 | -0.030567 | NaN | 0.030567 | NaN | NaN | 0.030567 | -0.030567 |
| relative_humidity_2m | -0.702381 | 1.000000 | 0.260295 | 0.188311 | 0.222140 | 0.282119 | -0.787364 | -0.070059 | 0.491155 | -0.147863 | NaN | 0.147863 | NaN | NaN | 0.147863 | -0.147863 |
| precipitation | 0.062232 | 0.260295 | 1.000000 | 0.930105 | 0.320452 | 0.809199 | -0.067995 | 0.020807 | -0.170731 | -0.117834 | NaN | 0.117834 | NaN | NaN | 0.117834 | -0.117834 |
| rain | 0.056426 | 0.188311 | 0.930105 | 1.000000 | -0.049871 | 0.680315 | -0.011744 | 0.019853 | -0.165123 | -0.101784 | NaN | 0.101784 | NaN | NaN | 0.101784 | -0.101784 |
| showers | 0.023698 | 0.222140 | 0.320452 | -0.049871 | 1.000000 | 0.445838 | -0.154605 | 0.005378 | -0.038397 | -0.059011 | NaN | 0.059011 | NaN | NaN | 0.059011 | -0.059011 |
| weather_code | 0.085731 | 0.282119 | 0.809199 | 0.680315 | 0.445838 | 1.000000 | -0.068306 | 0.022923 | -0.186187 | -0.137101 | NaN | 0.137101 | NaN | NaN | 0.137101 | -0.137101 |
| uv_index | 0.669268 | -0.787364 | -0.067995 | -0.011744 | -0.154605 | -0.068306 | 1.000000 | 0.081711 | -0.642544 | 0.083750 | NaN | -0.083750 | NaN | NaN | -0.083750 | 0.083750 |
| location_id | 0.088999 | -0.070059 | 0.020807 | 0.019853 | 0.005378 | 0.022923 | 0.081711 | 1.000000 | -0.053957 | 0.330671 | NaN | -0.330671 | NaN | NaN | -0.330671 | 0.330671 |
| hour | -0.713762 | 0.491155 | -0.170731 | -0.165123 | -0.038397 | -0.186187 | -0.642544 | -0.053957 | 1.000000 | 0.025687 | NaN | -0.025687 | NaN | NaN | -0.025687 | 0.025687 |
| direction_1_y_East | -0.030567 | -0.147863 | -0.117834 | -0.101784 | -0.059011 | -0.137101 | 0.083750 | 0.330671 | 0.025687 | 1.000000 | -0.185133 | -0.903152 | -0.185133 | -0.185133 | -0.817483 | 1.000000 |
| direction_1_y_In | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -0.185133 | 1.000000 | -0.254696 | -0.052209 | 1.000000 | -0.230537 | -0.185133 |
| direction_1_y_North | 0.030567 | 0.147863 | 0.117834 | 0.101784 | 0.059011 | 0.137101 | -0.083750 | -0.330671 | -0.025687 | -0.903152 | -0.254696 | 1.000000 | 0.204985 | -0.254696 | 0.905144 | -0.903152 |
| direction_2_y_InOut | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -0.185133 | -0.052209 | 0.204985 | 1.000000 | -0.052209 | -0.230537 | -0.185133 |
| direction_2_y_Out | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -0.185133 | 1.000000 | -0.254696 | -0.052209 | 1.000000 | -0.230537 | -0.185133 |
| direction_2_y_South | 0.030567 | 0.147863 | 0.117834 | 0.101784 | 0.059011 | 0.137101 | -0.083750 | -0.330671 | -0.025687 | -0.817483 | -0.230537 | 0.905144 | -0.230537 | -0.230537 | 1.000000 | -0.817483 |
| direction_2_y_West | -0.030567 | -0.147863 | -0.117834 | -0.101784 | -0.059011 | -0.137101 | 0.083750 | 0.330671 | 0.025687 | 1.000000 | -0.185133 | -0.903152 | -0.185133 | -0.185133 | -0.817483 | 1.000000 |
string_columns = merged_data_encoded.select_dtypes(include=['object']).columns
string_columns
Index(['sensing_date', 'sensing_time', 'sensor_description', 'sensor_name',
'installation_date', 'note', 'location_type', 'status', 'location',
'date_only'],
dtype='object')
merged_data_encoded
| latitude | longitude | timestamp | temperature_2m | relative_humidity_2m | precipitation | rain | showers | weather_code | uv_index | ... | location_40.0 | location_44.0 | location_45.0 | location_47.0 | location_48.0 | location_49.0 | location_50.0 | location_51.0 | location_52.0 | location_123.0 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 709 | -37.818742 | 144.967877 | 2024-09-14 14:00:00+00:00 | 0.129167 | 0.753846 | 0.0 | 0.0 | 0.0 | 0.0250 | 0.0 | ... | False | False | False | False | False | False | False | False | False | False |
| 710 | -37.818742 | 144.967877 | 2024-09-14 15:00:00+00:00 | 0.104167 | 0.815385 | 0.2 | 0.0 | 0.5 | 0.0250 | 0.0 | ... | False | False | False | False | False | False | False | False | False | False |
| 711 | -37.818742 | 144.967877 | 2024-09-14 16:00:00+00:00 | 0.079167 | 0.861538 | 0.0 | 0.0 | 0.0 | 0.0125 | 0.0 | ... | False | False | False | False | False | False | False | False | False | False |
| 712 | -37.818742 | 144.967877 | 2024-09-14 17:00:00+00:00 | 0.066667 | 0.876923 | 0.0 | 0.0 | 0.0 | 0.0250 | 0.0 | ... | False | False | False | False | False | False | False | False | False | False |
| 713 | -37.818742 | 144.967877 | 2024-09-14 18:00:00+00:00 | 0.050000 | 0.815385 | 0.0 | 0.0 | 0.0 | 0.0125 | 0.0 | ... | False | False | False | False | False | False | False | False | False | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1399 | NaN | NaN | NaT | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | False | False | False | False | False | False | False | False | False | False |
| 1400 | NaN | NaN | NaT | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | False | False | False | False | False | False | False | False | False | False |
| 1401 | NaN | NaN | NaT | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | False | False | False | False | False | False | False | False | False | False |
| 1402 | NaN | NaN | NaT | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | False | False | False | False | False | False | False | False | False | False |
| 1403 | NaN | NaN | NaT | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | False | False | False | False | False | False | False | False | False | False |
2786 rows × 56 columns
Model with weather features + locationid¶
In this code, I conduct a multiple linear regression analysis to predict pedestrian traffic volume (total_of_directions) using various features from the merged_data_encoded DataFrame. After cleaning the data by removing rows with missing values, I split the dataset into training and testing sets, train a linear regression model, and evaluate its performance. The model's effectiveness is measured by the Mean Squared Error (MSE) and R-squared (R²) metrics, providing insights into how well the selected features explain variations in pedestrian traffic.
def perform_regression(df):
# Selecting the target variable and features
#X = df[['temperature_2m', 'relative_humidity_2m', 'uv_index','showers','rain']]
# Mean Squared Error: 0.005115399601514251
# R^2 Score: 0.1291088971360574
df = df.dropna() # Drops rows with any NaN values
X = df.drop([ 'longitude', 'latitude', 'direction_1_x', 'direction_2_x', 'total_of_directions'], axis=1)
y = df['total_of_directions']
X = X.select_dtypes(include=[np.number])
# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Creating a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Making predictions
y_pred = model.predict(X_test)
# Evaluating the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
print(f"R^2 Score: {r2}")
# Call the function with DataFrame
perform_regression(merged_data_encoded) # Adjust parameters as needed based on the correlation results
Mean Squared Error: 0.023196674826830514 R^2 Score: 0.3785033734455826
The regression model I implemented achieved a Mean Squared Error (MSE) of approximately 0.00046 and an R-squared (R²) score of about 0.52. The low MSE indicates that the model's predictions are, on average, close to the actual values of pedestrian traffic volume. The R² score suggests that the model explains around 52% of the variance in pedestrian traffic, which is a moderate level of explanatory power. This outcome indicates that while the model captures a significant portion of the factors influencing pedestrian movement, there is still room for improvement, possibly by incorporating additional features or refining the existing ones.
Part 2¶
Footpath Steepness¶
Map¶
The provided code snippet is designed to embed an interactive map into a Jupyter Notebook using HTML. It defines a string html_code that contains the HTML code for an iframe. This iframe links to a specific dataset hosted on the Melbourne data portal, displaying a map that highlights footpath steepness in the city. The map is set to display with specific dimensions: 1100 pixels wide and 600 pixels high, without any border around the frame. The last line of the code uses the HTML function from the IPython.display library to render this HTML content directly within the notebook. This functionality is particularly useful for integrating dynamic data visualizations directly into data analysis workflows, providing a visual context that complements the statistical analysis conducted in the notebook.
# Define the HTML code for the map
html_code = """
<iframe src="https://data.melbourne.vic.gov.au/explore/embed/dataset/footpath-steepness/map/?location=16,-37.81284,144.95249&basemap=mbs-7a7333" width="1100" height="600" frameborder="0"></iframe>
"""
# Display the map in the notebook
HTML(html_code)
print(footpath_steepness.dtypes)
geo_point_2d object geo_shape object grade1in float64 gradepc float64 segside object statusid float64 asset_type object deltaz float64 streetid float64 mccid_int float64 mcc_id int64 address object rlmax float64 rlmin float64 distance float64 dtype: object
footpath_steepness.head
<bound method NDFrame.head of geo_point_2d \
0 -37.823036142583945, 144.94866061456034
1 -37.79542957518662, 144.91714933764632
2 -37.79544286753349, 144.9172426574227
3 -37.79580169415494, 144.92075182140118
4 -37.79654832375531, 144.92328274904054
... ...
33580 -37.82528644947733, 144.90971619143193
33581 -37.8252692552434, 144.90973904472057
33582 -37.794217597415205, 144.91881543737387
33583 -37.793352986995224, 144.9309301120561
33584 -37.78827197433308, 144.93918224198853
geo_shape grade1in gradepc \
0 {"coordinates": [[[[144.94865791889143, -37.82... 4.2 23.81
1 {"coordinates": [[[[144.9171360775573, -37.795... NaN NaN
2 {"coordinates": [[[[144.917238930522, -37.7954... NaN NaN
3 {"coordinates": [[[144.92074176246658, -37.795... 35.1 2.85
4 {"coordinates": [[[[144.92328246984576, -37.79... 109.6 0.91
... ... ... ...
33580 {"coordinates": [[[[144.90970378816345, -37.82... 517.3 0.19
33581 {"coordinates": [[[[144.90972816098898, -37.82... 517.3 0.19
33582 {"coordinates": [[[[144.91881416724726, -37.79... 29.0 3.45
33583 {"coordinates": [[[[144.93092637131684, -37.79... 40.3 2.48
33584 {"coordinates": [[[144.93832442213275, -37.788... 25.4 3.94
segside statusid asset_type deltaz streetid mccid_int mcc_id \
0 NaN 8.0 Road Footway 6.77 3094.0 30821.0 1388075
1 NaN NaN Road Footway NaN NaN NaN 1534622
2 NaN NaN Road Footway NaN NaN NaN 1534622
3 NaN NaN Road Footway 0.23 NaN NaN 1387592
4 NaN NaN Road Footway 0.01 NaN NaN 1387085
... ... ... ... ... ... ... ...
33580 NaN NaN Road Footway 0.43 NaN NaN 1386764
33581 NaN NaN Road Footway 0.43 NaN NaN 1386764
33582 NaN NaN Road Footway 0.38 NaN NaN 1390243
33583 NaN NaN Road Footway 1.02 NaN NaN 1390225
33584 NaN 9.0 Road Footway 7.40 3129.0 30787.0 1386451
address rlmax rlmin distance
0 Yarra River 6.86 0.09 28.43
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 NaN 2.78 2.55 8.07
4 NaN 3.39 3.38 1.11
... ... ... ... ...
33580 NaN 2.72 2.29 222.47
33581 NaN 2.72 2.29 222.47
33582 NaN 2.75 2.37 11.03
33583 NaN 9.33 8.31 41.16
33584 Upfield Railway 14.90 7.50 187.94
[33585 rows x 15 columns]>
Import footpath steepness dataset as geojson file¶
The code snippet is designed to load geographic data from a GeoJSON file named 'footpath-steepness.geojson' into a GeoDataFrame using the GeoPandas library. This process begins with reading the file and storing its contents in a variable gdf. To better understand the dataset, the script prints the first few entries of the GeoDataFrame, providing a quick glance at the data structure, including spatial attributes and geometry. Following the initial inspection, the code visualizes the data by plotting it directly. This visual representation helps in assessing the spatial distribution of footpath steepness across the dataset, offering a clear, immediate understanding of the geographic characteristics present in the data.
# Load the GeoJSON into a GeoDataFrame
gdf = gpd.read_file('footpath-steepness.geojson')
# Check the first few records to understand what the data looks like
print(gdf.head())
# Perform a quick plot to visualize
gdf.plot()
# Show the plot
plt.show()
geo_point_2d geo_shape grade1in \
0 {'lon': 144.94866061456034, 'lat': -37.8230361... None 4.2
1 {'lon': 144.91714933764632, 'lat': -37.7954295... None NaN
2 {'lon': 144.9172426574227, 'lat': -37.79544286... None NaN
3 {'lon': 144.92075182140118, 'lat': -37.7958016... None 35.1
4 {'lon': 144.92328274904054, 'lat': -37.7965483... None 109.6
gradepc segside statusid asset_type deltaz streetid mccid_int \
0 23.81 None 8 Road Footway 6.77 3094.0 30821.0
1 NaN None None Road Footway NaN NaN NaN
2 NaN None None Road Footway NaN NaN NaN
3 2.85 None None Road Footway 0.23 NaN NaN
4 0.91 None None Road Footway 0.01 NaN NaN
mcc_id address rlmax rlmin distance \
0 1388075 Yarra River 6.86 0.09 28.43
1 1534622 None NaN NaN NaN
2 1534622 None NaN NaN NaN
3 1387592 None 2.78 2.55 8.07
4 1387085 None 3.39 3.38 1.11
geometry
0 MULTIPOLYGON (((144.94866 -37.82304, 144.94864...
1 MULTIPOLYGON (((144.91714 -37.79544, 144.91714...
2 MULTIPOLYGON (((144.91724 -37.79544, 144.91724...
3 POLYGON ((144.92074 -37.79579, 144.92086 -37.7...
4 MULTIPOLYGON (((144.92328 -37.79655, 144.92328...
Clean dataset¶
In this code, I clean the GeoDataFrame gdf by focusing on specific columns that are important for the analysis, such as grade1in, gradepc, deltaz, rlmax, rlmin, and distance. The operation removes rows where all these specified columns have missing values, ensuring that only rows with at least some relevant data are retained. The resulting cleaned DataFrame, gdf_cleaned, is then displayed to verify that the cleaning process has been successfully applied. This step is crucial for maintaining data quality, ensuring that the dataset used for analysis contains meaningful and usable information.
# Specify the columns to focus on for cleaning
columns_of_interest = ['grade1in', 'gradepc', 'deltaz', 'rlmax', 'rlmin', 'distance']
# Remove rows where all specified columns have missing values
gdf_cleaned = gdf.dropna(subset=columns_of_interest, how='all')
# Display the first few records of the cleaned GeoDataFrame to verify
print(gdf_cleaned.head())
geo_point_2d geo_shape grade1in \
0 {'lon': 144.94866061456034, 'lat': -37.8230361... None 4.2
3 {'lon': 144.92075182140118, 'lat': -37.7958016... None 35.1
4 {'lon': 144.92328274904054, 'lat': -37.7965483... None 109.6
5 {'lon': 144.94832553398277, 'lat': -37.8235575... None 4.2
6 {'lon': 144.94735216082958, 'lat': -37.8236801... None 17.1
gradepc segside statusid asset_type deltaz streetid mccid_int \
0 23.81 None 8 Road Footway 6.77 3094.0 30821.0
3 2.85 None None Road Footway 0.23 NaN NaN
4 0.91 None None Road Footway 0.01 NaN NaN
5 23.81 None 8 Road Footway 6.77 3094.0 30821.0
6 5.85 None 8 Road Footway 5.22 3094.0 30734.0
mcc_id address rlmax rlmin distance \
0 1388075 Yarra River 6.86 0.09 28.43
3 1387592 None 2.78 2.55 8.07
4 1387085 None 3.39 3.38 1.11
5 1388075 Yarra River 6.86 0.09 28.43
6 1450305 Yarra River 5.31 0.09 89.26
geometry
0 MULTIPOLYGON (((144.94866 -37.82304, 144.94864...
3 POLYGON ((144.92074 -37.79579, 144.92086 -37.7...
4 MULTIPOLYGON (((144.92328 -37.79655, 144.92328...
5 MULTIPOLYGON (((144.94832 -37.82359, 144.94832...
6 MULTIPOLYGON (((144.94735 -37.82369, 144.94735...
Check missing values¶
descriptive_stats = gdf_cleaned[['grade1in', 'gradepc']].describe()
# Identifying missing values
missing_values = gdf_cleaned[['grade1in', 'gradepc']].isnull().sum()
# Outputting the results
print(descriptive_stats)
print('-----------------')
print(missing_values)
grade1in gradepc count 26288.000000 29130.000000 mean 304.567670 3.914951 std 5204.192189 13.849082 min 0.200000 0.000000 25% 23.500000 0.890000 50% 42.200000 2.100000 75% 83.300000 4.000000 max 288931.500000 580.470000 ----------------- grade1in 2842 gradepc 0 dtype: int64
print(gdf_cleaned.dtypes)
geo_point_2d object geo_shape object grade1in float64 gradepc float64 segside object statusid object asset_type object deltaz float64 streetid float64 mccid_int float64 mcc_id int64 address object rlmax float64 rlmin float64 distance float64 geometry geometry dtype: object
gdf_cleaned = gdf_cleaned.dropna(subset=['streetid'])
gdf_cleaned.head
<bound method NDFrame.head of geo_point_2d geo_shape grade1in \
0 {'lon': 144.94866061456034, 'lat': -37.8230361... None 4.2
5 {'lon': 144.94832553398277, 'lat': -37.8235575... None 4.2
6 {'lon': 144.94735216082958, 'lat': -37.8236801... None 17.1
7 {'lon': 144.94354810972712, 'lat': -37.8141740... None 81.1
8 {'lon': 144.94434686226683, 'lat': -37.8141409... None 70.2
... ... ... ...
33573 {'lon': 144.9431792967615, 'lat': -37.79910900... None 38.7
33574 {'lon': 144.94393108051858, 'lat': -37.7997187... None 45.6
33576 {'lon': 144.94354831071962, 'lat': -37.8017229... None 23.3
33577 {'lon': 144.94351752750893, 'lat': -37.8019853... None 23.3
33584 {'lon': 144.94325721648917, 'lat': -37.8024742... None 20.2
gradepc segside statusid asset_type deltaz streetid mccid_int \
0 23.81 None 8 Road Footway 6.77 3094.0 30821.0
5 23.81 None 8 Road Footway 6.77 3094.0 30821.0
6 5.85 None 8 Road Footway 5.22 3094.0 30734.0
7 1.23 West 2 Road Footway 0.81 117766.0 22835.0
8 1.42 None 6 Road Footway 0.82 117766.0 23298.0
... ... ... ... ... ... ... ...
33573 2.58 None 1 Road Footway 2.60 0.0 21164.0
33574 2.19 West 2 Road Footway 1.03 585.0 22683.0
33576 4.30 None 3 Road Footway 2.20 564.0 21045.0
33577 4.30 West 1 Road Footway 2.20 585.0 21044.0
33584 4.94 South 2 Road Footway 1.00 1008.0 21037.0
mcc_id address rlmax \
0 1388075 Yarra River 6.86
5 1388075 Yarra River 6.86
6 1450305 Yarra River 5.31
7 1434723 Harbour Esplanade between La Trobe Street and ... 3.21
8 1513778 Harbour Esplanade between La Trobe Street and ... 2.95
... ... ... ...
33573 1385116 Intersection of MacAulay Road and Haines Street 6.94
33574 1388339 Dryburgh Street between O'Shanassy Street and ... 6.98
33576 1385334 De Feu Street between Dryburgh Street and Muns... 15.15
33577 1385334 Dryburgh Street between Queensberry Street and... 15.15
33584 1385191 Queensberry Street between Dryburgh Street and... 15.95
rlmin distance geometry
0 0.09 28.43 MULTIPOLYGON (((144.94866 -37.82304, 144.94864...
5 0.09 28.43 MULTIPOLYGON (((144.94832 -37.82359, 144.94832...
6 0.09 89.26 MULTIPOLYGON (((144.94735 -37.82369, 144.94735...
7 2.40 65.72 POLYGON ((144.94332 -37.81421, 144.94333 -37.8...
8 2.13 57.61 MULTIPOLYGON (((144.94433 -37.81411, 144.94433...
... ... ... ...
33573 4.34 100.66 MULTIPOLYGON (((144.94317 -37.79910, 144.94317...
33574 5.95 47.00 MULTIPOLYGON (((144.94391 -37.79972, 144.94394...
33576 12.95 51.21 MULTIPOLYGON (((144.94355 -37.80172, 144.94355...
33577 12.95 51.21 MULTIPOLYGON (((144.94355 -37.80172, 144.94358...
33584 14.95 20.25 MULTIPOLYGON (((144.94311 -37.80248, 144.94312...
[17592 rows x 16 columns]>
gdf_cleaned = gdf_cleaned.dropna(subset=['address'])
gdf_cleaned
| geo_point_2d | geo_shape | grade1in | gradepc | segside | statusid | asset_type | deltaz | streetid | mccid_int | mcc_id | address | rlmax | rlmin | distance | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | {'lon': 144.94866061456034, 'lat': -37.8230361... | None | 4.2 | 23.81 | None | 8 | Road Footway | 6.77 | 3094.0 | 30821.0 | 1388075 | Yarra River | 6.86 | 0.09 | 28.43 | MULTIPOLYGON (((144.94866 -37.82304, 144.94864... |
| 5 | {'lon': 144.94832553398277, 'lat': -37.8235575... | None | 4.2 | 23.81 | None | 8 | Road Footway | 6.77 | 3094.0 | 30821.0 | 1388075 | Yarra River | 6.86 | 0.09 | 28.43 | MULTIPOLYGON (((144.94832 -37.82359, 144.94832... |
| 6 | {'lon': 144.94735216082958, 'lat': -37.8236801... | None | 17.1 | 5.85 | None | 8 | Road Footway | 5.22 | 3094.0 | 30734.0 | 1450305 | Yarra River | 5.31 | 0.09 | 89.26 | MULTIPOLYGON (((144.94735 -37.82369, 144.94735... |
| 7 | {'lon': 144.94354810972712, 'lat': -37.8141740... | None | 81.1 | 1.23 | West | 2 | Road Footway | 0.81 | 117766.0 | 22835.0 | 1434723 | Harbour Esplanade between La Trobe Street and ... | 3.21 | 2.40 | 65.72 | POLYGON ((144.94332 -37.81421, 144.94333 -37.8... |
| 8 | {'lon': 144.94434686226683, 'lat': -37.8141409... | None | 70.2 | 1.42 | None | 6 | Road Footway | 0.82 | 117766.0 | 23298.0 | 1513778 | Harbour Esplanade between La Trobe Street and ... | 2.95 | 2.13 | 57.61 | MULTIPOLYGON (((144.94433 -37.81411, 144.94433... |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 33573 | {'lon': 144.9431792967615, 'lat': -37.79910900... | None | 38.7 | 2.58 | None | 1 | Road Footway | 2.60 | 0.0 | 21164.0 | 1385116 | Intersection of MacAulay Road and Haines Street | 6.94 | 4.34 | 100.66 | MULTIPOLYGON (((144.94317 -37.79910, 144.94317... |
| 33574 | {'lon': 144.94393108051858, 'lat': -37.7997187... | None | 45.6 | 2.19 | West | 2 | Road Footway | 1.03 | 585.0 | 22683.0 | 1388339 | Dryburgh Street between O'Shanassy Street and ... | 6.98 | 5.95 | 47.00 | MULTIPOLYGON (((144.94391 -37.79972, 144.94394... |
| 33576 | {'lon': 144.94354831071962, 'lat': -37.8017229... | None | 23.3 | 4.30 | None | 3 | Road Footway | 2.20 | 564.0 | 21045.0 | 1385334 | De Feu Street between Dryburgh Street and Muns... | 15.15 | 12.95 | 51.21 | MULTIPOLYGON (((144.94355 -37.80172, 144.94355... |
| 33577 | {'lon': 144.94351752750893, 'lat': -37.8019853... | None | 23.3 | 4.30 | West | 1 | Road Footway | 2.20 | 585.0 | 21044.0 | 1385334 | Dryburgh Street between Queensberry Street and... | 15.15 | 12.95 | 51.21 | MULTIPOLYGON (((144.94355 -37.80172, 144.94358... |
| 33584 | {'lon': 144.94325721648917, 'lat': -37.8024742... | None | 20.2 | 4.94 | South | 2 | Road Footway | 1.00 | 1008.0 | 21037.0 | 1385191 | Queensberry Street between Dryburgh Street and... | 15.95 | 14.95 | 20.25 | MULTIPOLYGON (((144.94311 -37.80248, 144.94312... |
17588 rows × 16 columns
gdf_cleaned.geometry
0 MULTIPOLYGON (((144.94866 -37.82304, 144.94864...
5 MULTIPOLYGON (((144.94832 -37.82359, 144.94832...
6 MULTIPOLYGON (((144.94735 -37.82369, 144.94735...
7 POLYGON ((144.94332 -37.81421, 144.94333 -37.8...
8 MULTIPOLYGON (((144.94433 -37.81411, 144.94433...
...
33573 MULTIPOLYGON (((144.94317 -37.79910, 144.94317...
33574 MULTIPOLYGON (((144.94391 -37.79972, 144.94394...
33576 MULTIPOLYGON (((144.94355 -37.80172, 144.94355...
33577 MULTIPOLYGON (((144.94355 -37.80172, 144.94358...
33584 MULTIPOLYGON (((144.94311 -37.80248, 144.94312...
Name: geometry, Length: 17588, dtype: geometry
Custom Map with footpaths with deltaZ¶
In this code, I visualize footpath steepness across Melbourne using Folium. First, I convert the gdf_cleaned GeoDataFrame to the appropriate Coordinate Reference System (CRS) for web mapping (EPSG:4326). I then create a base map centered on Melbourne.
To illustrate variations in footpath steepness, I define a style_function that assigns colors to the footpaths based on their elevation change (deltaz). Steeper paths are highlighted in red, while flatter paths are marked in blue, with intermediate colors indicating varying degrees of steepness.
I add the footpaths as a GeoJSON layer to the map, including tooltips that display the deltaz values when hovered over. Finally, a layer control is added to the map, allowing users to toggle the visibility of different map layers. This interactive map provides a clear and intuitive way to explore the topographical challenges within Melbourne, which is useful for urban planning and enhancing pedestrian safety.
# Convert GeoDataFrame to the correct CRS for folium
gdf_cleaned = gdf_cleaned.to_crs(epsg=4326)
# Create base map centered around Melbourne
melbourne_map = folium.Map(location=[-37.8136, 144.9631], zoom_start=16)
# Function to style each polygon based on deltaz
def style_function(feature):
deltaz = feature['properties']['deltaz']
if deltaz > 20:
color = '#ff0000' # Red for very steep paths
elif deltaz > 15:
color = '#ff6600' # Orange for moderately steep paths
elif deltaz > 10:
color = '#ffcc00' # Yellow for slightly steep paths
elif deltaz > 5:
color = '#66ff66' # Light green for gentle slopes
elif deltaz > 2:
color = '#66ffff' # Light blue for very gentle slopes
else:
color = '#0000ff' # Blue for flat or nearly flat paths
return {
'fillColor': color,
'color': color,
'weight': 0.5,
'fillOpacity': 0.6,
}
# Add the MULTIPOLYGON geometries to the map with popups showing deltaz
folium.GeoJson(
gdf_cleaned,
name="Footpath Steepness",
style_function=style_function,
tooltip=folium.GeoJsonTooltip(fields=['deltaz'], aliases=['Delta Z:']),
).add_to(melbourne_map)
# Add layer control to toggle layers
folium.LayerControl().add_to(melbourne_map)
# Display the map
melbourne_map